  • ImmunoGlobe: enabling systems immunology with a manually curated intercellular immune interaction network. BMC bioinformatics Atallah, M. B., Tandon, V., Hiam, K. J., Boyce, H., Hori, M., Atallah, W., Spitzer, M. H., Engleman, E., Mallick, P. 2020; 21 (1): 346


    BACKGROUND: While technological advances have made it possible to profile the immune system at high resolution, translating high-throughput data into knowledge of immune mechanisms has been challenged by the complexity of the interactions underlying immune processes. Tools to explore the immune network are critical for better understanding the multi-layered processes that underlie immune function and dysfunction, but require a standardized network map of immune interactions. To facilitate this we have developed ImmunoGlobe, a manually curated intercellular immune interaction network extracted from Janeway's Immunobiology textbook.RESULTS: ImmunoGlobe is the first graphical representation of the immune interactome, and is comprised of 253 immune system components and 1112 unique immune interactions with detailed functional and characteristic annotations. Analysis of this network shows that it recapitulates known features of the human immune system and can be used uncover novel multi-step immune pathways, examine species-specific differences in immune processes, and predict the response of immune cells to stimuli. ImmunoGlobe is publicly available through a user-friendly interface at and can be downloaded as a computable graph and network table.CONCLUSION: While the fields of proteomics and genomics have long benefited from network analysis tools, no such tool yet exists for immunology. ImmunoGlobe provides a ground truth immune interaction network upon which such tools can be built. These tools will allow us to predict the outcome of complex immune interactions, providing mechanistic insight that allows us to precisely modulate immune responses in health and disease.

  • Geostatistical visualization of ecological interactions in tumors. Proceedings. IEEE International Conference on Bioinformatics and Biomedicine Boyce, H. B., Mallick, P. 2019; 2019: 2741–49


    Recent advances in our understanding of cancer progression have highlighted the roles played by molecular heterogeneity and by the tumor microenvironment in driving drug resistance and metastasis. The coupling of single-cell measurement technologies with algorithms, such as t-sne and SPADE, have enabled deep investigation of tumor heterogeneity. However, such techniques only capture molecular heterogeneity and do not enable the quantification nor visualization of intercellular interactions. They additionally do not allow the visualization of ecological niches that are critical to understanding tumor behavior. Novel computational tools to quantify and visualize spatial patterns in the tumor microenvironment are critically needed. Here, we take a tumor ecology perspective to examine how predation, mutualism, commensalism, and parasitism may impact tumor development and spatial patterning. We additionally quantify local spatial heterogeneity and the emergent global spatial behavior of the models using geostatistics. By visualizing emergent spatial patterns we demonstrate the potential utility of a geostatistical analysis in differentiating amongst cell-cell interactions in the tumor microenvironment. These studies introduce both an ecological framework for characterizing intercellular interactions in cancer and a novel way of quantifying and visualizing spatial patterns in cancer.

  • Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility Srivastava, A., Adusumilli, R., Boyce, H., Garijo, D., Ratnakar, V., Mayani, R., Yu, T., Machiraju, R., Gil, Y., Mallick, P., Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2019: 208–19


    Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers' approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge.

  • A Temporal Examination of Platelet Counts as a Predictor of Prognosis in Lung, Prostate, and Colon Cancer Patients. Scientific reports Sylman, J. L., Boyce, H. B., Mitrugno, A., Tormoen, G. W., Thomas, I. C., Wagner, T. H., Lee, J. S., Leppert, J. T., McCarty, O. J., Mallick, P. 2018; 8 (1): 6564


    Platelets, components of hemostasis, when present in excess (>400 K/μL, thrombocytosis) have also been associated with worse outcomes in lung, ovarian, breast, renal, and colorectal cancer patients. Associations between thrombocytosis and cancer outcomes have been made mostly from single-time-point studies, often at the time of diagnosis. Using laboratory data from the Department of Veterans Affairs (VA), we examined the potential benefits of using longitudinal platelet counts in improving patient prognosis predictions. Ten features (summary statistics and engineered features) were derived to describe the platelet counts of 10,000+ VA lung, prostate, and colon cancer patients and incorporated into an age-adjusted LASSO regression analysis to determine feature importance, and predict overall or relapse-free survival, which was compared to the previously used approach of monitoring for thrombocytosis near diagnosis (Postdiag AG400 model). Temporal features describing acute platelet count increases/decreases were found to be important in cancer survival and relapse-survival that helped stratify good and bad outcomes of cancer patient groups. Predictions of overall and relapse-free survival were improved by up to 30% compared to the Postdiag AG400 model. Our study indicates the association of temporally derived platelet count features with a patients' prognosis predictions.

  • Towards Continuous Scientific Data Analysis and Hypothesis Evolution Thirty-First AAAI Conference on Artificial Intelligence Gil, Y., Garijo, D., Ratnakar, V., Mayani, R., Adusumilli, R., Boyce, H., Srivastava, A., Mallick, P. 2017
  • Automated Hypothesis Testing with Large Scientific Data Repositories Annual Conference on Advance s in Cognitive Systems Gil, Y., Garijo, D., Ratnakar, V., Mayani, R., Adusumilli, R., Boyce, H., Mallick, P. 2016