I am a PhD Candidate in Biomedical Informatics at Stanford University School of Medicine. I graduated from the Indian Institute of Technology, Kharagpur under the Dual Degree Programme - Bachelor (Honours) and Master of Technology, in Biotechnology and Biochemical Engineering. During my stay at IIT, I had qualified for the prestigious Google Summer of Code Program for 3 successive years. I have contributed to Drupal, an open-source content management platform, and Genome Informatics - Reactome Project, a knowledgebase of biological pathways.
I am interested to research at the intersection of Biosciences, Big Data and the Web. After my graduation, I had joined the Digital Enterprise Research Institute (DERI), Ireland under its Health Care and Life Sciences Unit. I was responsible for the development of user-driven platforms, facilitating intuitive data exploration, for the EU FP7 GRANATUM Project, Linked TCGA Project and the Ireland's Open Data Initiative (Data.gov.ie). I was a part of the team, which won the Best Paper Award at CSHALS 2014 and the Semantic Web Challenge Award (Big Data Prize) at ISWC 2013.
Honors & Awards
Remote Student Teaching Excellence Award, Stanford University (May 2018)
NSF Travel Award, 1st United States Semantic Technologies Symposium (March 2018)
Distinguished Paper Award, AMIA Annual Symposium 2017 (November 2017)
Student Best Resource Paper Award, 16th International Semantic Web Conference (October 2017)
NSF Travel Award, 16th International Semantic Web Conference (October 2017)
Best Poster Award (Graphic Design), Stanford Biomedical Informatics Program (September 2017)
NIH/NLM Travel Award, 21st Pacific Symposium on Biocomputing (January 2016)
Best Project Award (Graphic Design), Stanford Biomedical Informatics Program (September 2015)
Best Paper Award, 7th Conference on Semantics in Healthcare and Life Sciences (February 2014)
Semantic Web Challenge Award (Big Data Prize), 12th International Semantic Web Conference (October 2013)
Best Project Award, 10th Summer School on Ontology Engineering and the Semantic Web (July 2013)
Best Poster Award, 10th Summer School on Ontology Engineering and the Semantic Web (July 2013)
Google Summer of Code Student (Genome Informatics - Reactome), Google Inc. (August 2012)
Honourable Mention in Technology, Indian Institute of Technology (IIT), Kharagpur (April 2012)
Best Outgoing Student (Technology), Meghnad Saha Hall of Residence, IIT Kharagpur (April 2012)
Google Summer of Code Student (Genome Informatics - Reactome), Google Inc. (August 2011)
Google Summer of Code Student (Drupal), Google Inc. (August 2010)
Xavierite Super Award, St. Xavier’s High School, Ahmedabad (February 2007)
BiOnIC: A Catalog of User Interactions with Biomedical Ontologies
International Semantic Web Conference
View details for DOI 10.1007/978-3-319-68204-4_13
PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
International Conference on World Wide Web
View details for DOI 10.1145/3038912.3052692
Analyzing user interactions with biomedical ontologies: A visual perspective
JOURNAL OF WEB SEMANTICS
2018; 49: 16–30
Biomedical ontologies are large: Several ontologies in the BioPortal repository contain thousands or even hundreds of thousands of entities. The development and maintenance of such large ontologies is difficult. To support ontology authors and repository developers in their work, it is crucial to improve our understanding of how these ontologies are explored, queried, reused, and used in downstream applications by biomedical researchers. We present an exploratory empirical analysis of user activities in the BioPortal ontology repository by analyzing BioPortal interaction logs across different access modes over several years. We investigate how users of BioPortal query and search for ontologies and their classes, how they explore the ontologies, and how they reuse classes from different ontologies. Additionally, through three real-world scenarios, we not only analyze the usage of ontologies for annotation tasks but also compare it to the browsing and querying behaviors of BioPortal users. For our investigation, we use several different visualization techniques. To inspect large amounts of interaction, reuse, and real-world usage data at a glance, we make use of and extend PolygOnto, a visualization method that has been successfully used to analyze reuse of ontologies in previous work. Our results show that exploration, query, reuse, and actual usage behaviors rarely align, suggesting that different users tend to explore, query and use different parts of an ontology. Finally, we highlight and discuss differences and commonalities among users of BioPortal.
View details for DOI 10.1016/j.websem.2017.12.002
View details for Web of Science ID 000428090300002
View details for PubMedID 29657560
View details for PubMedCentralID PMC5895104
MRI to MGMT: predicting methylation status in glioblastoma patients using convolutional recurrent neural networks.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2018; 23: 331–42
Glioblastoma Multiforme (GBM), a malignant brain tumor, is among the most lethal of all cancers. Temozolomide is the primary chemotherapy treatment for patients diagnosed with GBM. The methylation status of the promoter or the enhancer regions of the O6-methylguanine methyltransferase (MGMT) gene may impact the efficacy and sensitivity of temozolomide, and hence may affect overall patient survival. Microscopic genetic changes may manifest as macroscopic morphological changes in the brain tumors that can be detected using magnetic resonance imaging (MRI), which can serve as noninvasive biomarkers for determining methylation of MGMT regulatory regions. In this research, we use a compendium of brain MRI scans of GBM patients collected from The Cancer Imaging Archive (TCIA) combined with methylation data from The Cancer Genome Atlas (TCGA) to predict the methylation state of the MGMT regulatory regions in these patients. Our approach relies on a bi-directional convolutional recurrent neural network architecture (CRNN) that leverages the spatial aspects of these 3-dimensional MRI scans. Our CRNN obtains an accuracy of 67% on the validation data and 62% on the test data, with precision and recall both at 67%, suggesting the existence of MRI features that may complement existing markers for GBM patient stratification and prognosis. We have additionally presented our model via a novel neural network visualization platform, which we have developed to improve interpretability of deep learning MRI-based classification models.
View details for PubMedID 29218894
Mechanism-based Pharmacovigilance over the Life Sciences Linked Open Data Cloud.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2017; 2017: 1014–23
Adverse drug reactions (ADR) result in significant morbidity and mortality in patients, and a substantial proportion of these ADRs are caused by drug-drug interactions (DDIs). Pharmacovigilance methods are used to detect unanticipated DDIs and ADRs by mining Spontaneous Reporting Systems, such as the US FDA Adverse Event Reporting System (FAERS). However, these methods do not provide mechanistic explanations for the discovered drug-ADR associations in a systematic manner. In this paper, we present a systems pharmacology-based approach to perform mechanism-based pharmacovigilance. We integrate data and knowledge from four different sources using Semantic Web Technologies and Linked Data principles to generate a systems network. We present a network-based Apriori algorithm for association mining in FAERS reports. We evaluate our method against existing pharmacovigilance methods for three different validation sets. Our method has AUROC statistics of 0.7-0.8, similar to current methods, and event-specific thresholds generate AUROC statistics greater than 0.75 for certain ADRs. Finally, we discuss the benefits of using Semantic Web technologies to attain the objectives for mechanism-based pharmacovigilance.
View details for PubMedID 29854169
PRISM: A DATA-DRIVEN PLATFORM FOR MONITORING MENTAL HEALTH.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2016; 21: 333-344
Neuropsychiatric disorders are the leading cause of disability worldwide and there is no gold standard currently available for the measurement of mental health. This issue is exacerbated by the fact that the information physicians use to diagnose these disorders is episodic and often subjective. Current methods to monitor mental health involve the use of subjective DSM-5 guidelines, and advances in EEG and video monitoring technologies have not been widely adopted due to invasiveness and inconvenience. Wearable technologies have surfaced as a ubiquitous and unobtrusive method for providing continuous, quantitative data about a patient. Here, we introduce PRISM-Passive, Real-time Information for Sensing Mental Health. This platform integrates motion, light and heart rate data from a smart watch application with user interactions and text entries from a web application. We have demonstrated a proof of concept by collecting preliminary data through a pilot study of 13 subjects. We have engineered appropriate features and applied both unsupervised and supervised learning to develop models that are predictive of user-reported ratings of their emotional state, demonstrating that the data has the potential to be useful for evaluating mental health. This platform could allow patients and clinicians to leverage continuous streams of passive data for early and accurate diagnosis as well as constant monitoring of patients suffering from mental disorders.
View details for PubMedID 26776198
- A Systematic Analysis on Term Reuse and Term Overlap across Biomedical Ontologies Semantic Web - Interoperability, Usability, Applicability 2016
An Ebola virus-centered knowledge base.
Database : the journal of biological databases and curation
Ebola virus (EBOV), of the family Filoviridae viruses, is a NIAID category A, lethal human pathogen. It is responsible for causing Ebola virus disease (EVD) that is a severe hemorrhagic fever and has a cumulative death rate of 41% in the ongoing epidemic in West Africa. There is an ever-increasing need to consolidate and make available all the knowledge that we possess on EBOV, even if it is conflicting or incomplete. This would enable biomedical researchers to understand the molecular mechanisms underlying this disease and help develop tools for efficient diagnosis and effective treatment. In this article, we present our approach for the development of an Ebola virus-centered Knowledge Base (Ebola-KB) using Linked Data and Semantic Web Technologies. We retrieve and aggregate knowledge from several open data sources, web services and biomedical ontologies. This knowledge is transformed to RDF, linked to the Bio2RDF datasets and made available through a SPARQL 1.1 Endpoint. Ebola-KB can also be explored using an interactive Dashboard visualizing the different perspectives of this integrated knowledge. We showcase how different competency questions, asked by domain users researching the druggability of EBOV, can be formulated as SPARQL Queries or answered using the Ebola-KB Dashboard.
View details for DOI 10.1093/database/bav049
View details for PubMedID 26055098
View details for PubMedCentralID PMC4460400
- Investigating Term Reuse and Overlap in Biomedical Ontologies 6th International Conference on Biomedical Ontology (ICBO) 2015
ReVeaLD: A user-driven domain-specific interactive search platform for biomedical research
JOURNAL OF BIOMEDICAL INFORMATICS
2014; 47: 112-130
View details for DOI 10.1016/j.jbi.2013.10.001
View details for Web of Science ID 000333004500012
View details for PubMedID 24135450
- GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer research 7th Conference on Semantics in Healthcare and Life Sciences (CSHALS) 2014
The Reactome pathway knowledgebase.
Nucleic acids research
2014; 42 (Database issue): D472-7
Reactome (http://www.reactome.org) is a manually curated open-source open-data resource of human pathways and reactions. The current version 46 describes 7088 human proteins (34% of the predicted human proteome), participating in 6744 reactions based on data extracted from 15 107 research publications with PubMed links. The Reactome Web site and analysis tool set have been completely redesigned to increase speed, flexibility and user friendliness. The data model has been extended to support annotation of disease processes due to infectious agents and to mutation.
View details for DOI 10.1093/nar/gkt1102
View details for PubMedID 24243840
View details for PubMedCentralID PMC3965010
- LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery 4th Workshop on Linked Science co-located with 13th International Semantic Web Conference 2014: 48–59
- A Roadmap for navigating the Life Sciences Linked Open Data Cloud 4th Joint International Semantic Technology (JIST) Conference 2014
- Open Data Ireland: Data Audit Report Open Data Ireland Support Project 2014
Big linked cancer data: Integrating linked TCGA and PubMed
Web Semantics: Science, Services and Agents on the World Wide Web
View details for DOI 10.1016/j.websem.2014.07.004
Linked Biomedical Dataspace: Lessons Learned Integrating Data for Drug Discovery
13th International Semantic Web Conference (ISWC)
View details for DOI 10.1007/978-3-319-11964-9_8
Identification of an Extracellular Antifungal Protein from the Endophytic Fungus Colletotrichum sp DM06
PROTEIN AND PEPTIDE LETTERS
2013; 20 (2): 173-179
An extracellular antifungal protein of 28 kDa (exAFP-C28) was identified from an endophytic fungus Colletotrichum sp. DM-06. After purification, the MIC value of exAFP-C28 against Candida albicans, a well-known human pathogenic fungus was found to be 32 μg/mL that unaffected the human red blood cells. The antifungal activity associated with exAFP-C28 was manifested by the increased membrane permeability of C. albicans cells followed by disruption. Proteomics and bioinformatics analyses revealed that several peptide fragments of exAFP-C28 have identity with the bacterial 50S ribosomal protein L10, and a stretch of 55 amino acids of two peptide fragments corresponding to the Nterminus of L10 protein is capable of forming amphipathic helix required for membrane penetration. Taken together, our results suggest that the exAFP-C28 protein from Colletotrichum sp. DM-06 is a promising therapeutic agent in controlling candidiasis disease in animals including humans.
View details for Web of Science ID 000316859400008
View details for PubMedID 22894154
- Fostering Serendipity through Big Linked Data 12th International Semantic Web Conference (ISWC) 2013