Doctor of Philosophy, Wayne State University (2015)
Master of Science, Universita Degli Studi Di Pisa (2006)
Purvesh Khatri, Postdoctoral Faculty Sponsor
A new computational drug repurposing method using established disease-drug pair knowledge.
Bioinformatics (Oxford, England)
MOTIVATION: Drug repurposing is a potential alternative to the classical drug discovery pipeline. Repurposing involves finding novel indications for already approved drugs. In this work, we present a novel machine learning-based method for drug repurposing. This method explores the anti-similarity between drugs and a disease to uncover new uses for the drugs. More specifically, our proposed method takes into account three sources of information: i) large scale gene expression profiles corresponding to human cell lines treated with small molecules, ii) gene expression profile of a human disease and iii) the known relationship between FDA-approved drugs and diseases. Using these data, our proposed method learns a similarity metric through a supervised machine learning-based algorithm such that a disease and its associated FDA-approved drugs have smaller distance than the other disease-drug pairs.RESULTS: We validated our framework by showing that the proposed method incorporating distance metric learning technique can retrieve FDA-approved drugs for their approved indications. Once validated, we used our approach to identify a few strong candidates for repurposing.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btz156
View details for PubMedID 30840053
- Comparison of the Transcriptomic Signature of Pediatric Vs. Adult CML and Normal Bone Marrow Stem Cells AMER SOC HEMATOLOGY. 2018
A novel computational approach for drug repurposing using systems biology
2018; 34 (16): 2817–25
Identification of novel therapeutic effects for existing US Food and Drug Administration (FDA)-approved drugs, drug repurposing, is an approach aimed to dramatically shorten the drug discovery process, which is costly, slow and risky. Several computational approaches use transcriptional data to find potential repurposing candidates. The main hypothesis of such approaches is that if gene expression signature of a particular drug is opposite to the gene expression signature of a disease, that drug may have a potential therapeutic effect on the disease. However, this may not be optimal since it fails to consider the different roles of genes and their dependencies at the system level.We propose a systems biology approach to discover novel therapeutic roles for established drugs that addresses some of the issues in the current approaches. To do so, we use publicly available drug and disease data to build a drug-disease network by considering all interactions between drug targets and disease-related genes in the context of all known signaling pathways. This network is integrated with gene-expression measurements to identify drugs with new desired therapeutic effects based on a system-level analysis method. We compare the proposed approach with the drug repurposing approach proposed by Sirota et al. on four human diseases: idiopathic pulmonary fibrosis, non-small cell lung cancer, prostate cancer and breast cancer. We evaluate the proposed approach based on its ability to re-discover drugs that are already FDA-approved for a given disease.The R package DrugDiseaseNet is under review for publication in Bioconductor and is available at https://github.com/azampvd/DrugDiseaseNet.Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/bty133
View details for Web of Science ID 000441730900015
View details for PubMedID 29534151
View details for PubMedCentralID PMC6084573
Fragile Histidine Triad (FHIT), a Novel Modifier Gene in Pulmonary Arterial Hypertension.
American journal of respiratory and critical care medicine
RATIONALE: Pulmonary arterial hypertension (PAH) is characterized by progressive narrowing of pulmonary arteries resulting in right heart failure and death. Bone Morphogenetic Protein Receptor type-2 (BMPR2) mutations account for most familial PAH (FPAH) forms while reduced BMPR2 is present in many idiopathic PAH (IPAH) forms, suggesting dysfunctional BMPR2 signaling to be a key feature of PAH. Modulating BMPR2 signaling is therapeutically promising, yet how BMPR2 is downregulated in PAH is unclear.OBJECTIVES: We intended to identify and pharmaceutically target BMPR2 modifier genes to improve PAH.METHODS: We combined siRNA High Throughput Screening (HTS) of >20,000 genes with a multi-cohort analysis of publicly available PAH RNA expression data to identify clinically relevant BMPR2-modifiers. After confirming gene dysregulation in PAH patient tissue, we determined the functional roles of BMPR2-modifiers in vitro and tested the repurposed drug Enzastaurin for its propensity to improve experimental PH.MEASUREMENTS AND MAIN RESULTS: We discovered Fragile Histidine Triad (FHIT) as a novel BMPR2-modifier. BMPR2 and FHIT expression were reduced in PAH patients. FHIT reductions were associated with endothelial and smooth muscle cell dysfunction, rescued by Enzastaurin through a dual mechanism: upregulation of FHIT as well as miR17-5 repression. Fhit-/- mice had exaggerated hypoxic PH and failed to recover in normoxia. Enzastaurin reversed PH in the Sugen5416/Hypoxia/Normoxia rat model, by improving Right Ventricular Systolic Pressure (RVSP), RV hypertrophy, cardiac fibrosis and vascular remodeling.CONCLUSIONS: This study highlights the importance of the novel BMPR2 modifier FHIT in PH and the clinical value of the repurposed drug Enzastaurin as a potential novel therapeutic strategy to improve PAH.
View details for DOI 10.1164/rccm.201712-2553OC
View details for PubMedID 30107138
Single-cell epigenetics - Chromatin modification atlas unveiled by mass cytometry.
Clinical immunology (Orlando, Fla.)
Modifications of histone proteins are fundamental to the regulation of epigenetic phenotypes. Dysregulations of histone modifications have been linked to the pathogenesis of diverse human diseases. However, identifying differential histone modifications in patients with immune-mediated diseases has been challenging, in part due to the lack of a powerful analytic platform to study histone modifications in the complex human immune system. We recently developed a highly multiplexed platform, Epigenetic landscape profiling using cytometry by Time-Of-Flight (EpiTOF), to analyze the global levels of a broad array of histone modifications in single cells using mass cytometry. In this review, we summarize the development of EpiTOF and discuss its potential applications in biomedical research. We anticipate that this platform will provide new insights into the roles of epigenetic regulation in hematopoiesis, immune cell functions and immune system aging, and reveal aberrant epigenetic patterns associated with immune-mediated diseases.
View details for DOI 10.1016/j.clim.2018.06.009
View details for PubMedID 29960011
Single-Cell Chromatin Modification Profiling Reveals Increased Epigenetic Variations with Aging.
Post-translational modifications of histone proteins and exchanges of histone variants of chromatin are central to the regulation of nearly all DNA-templated biological processes. However, the degree and variability of chromatin modifications in specific human immune cells remain largely unknown. Here, we employ a highly multiplexed mass cytometry analysis to profile the global levels of a broad array of chromatin modifications in primary human immune cells at the single-cell level. Our data reveal markedly different cell-type- and hematopoietic-lineage-specific chromatin modification patterns. Differential analysis between younger and older adults shows that aging is associated with increased heterogeneity between individuals and elevated cell-to-cell variability in chromatin modifications. Analysis of a twin cohort unveils heritability of chromatin modifications and demonstrates that aging-related chromatin alterations are predominantly driven by non-heritable influences. Together, we present a powerful platform for chromatin and immunology research. Our discoveries highlight the profound impacts of aging on chromatin modifications.
View details for DOI 10.1016/j.cell.2018.03.079
View details for PubMedID 29706550
Inflammatory macrophage-associated 3-gene signature predicts subclinical allograft injury and graft survival.
2018; 3 (2)
Late allograft failure is characterized by cumulative subclinical insults manifesting over many years. Although immunomodulatory therapies targeting host T cells have improved short-term survival rates, rates of chronic allograft loss remain high. We hypothesized that other immune cell types may drive subclinical injury, ultimately leading to graft failure. We collected whole-genome transcriptome profiles from 15 independent cohorts composed of 1,697 biopsy samples to assess the association of an inflammatory macrophage polarization-specific gene signature with subclinical injury. We applied penalized regression to a subset of the data sets and identified a 3-gene inflammatory macrophage-derived signature. We validated discriminatory power of the 3-gene signature in 3 independent renal transplant data sets with mean AUC of 0.91. In a longitudinal cohort, the 3-gene signature strongly correlated with extent of injury and accurately predicted progression of subclinical injury 18 months before clinical manifestation. The 3-gene signature also stratified patients at high risk of graft failure as soon as 15 days after biopsy. We found that the 3-gene signature also distinguished acute rejection (AR) accurately in 3 heart transplant data sets but not in lung transplant. Overall, we identified a parsimonious signature capable of diagnosing AR, recognizing subclinical injury, and risk-stratifying renal transplant patients. Our results strongly suggest that inflammatory macrophages may be a viable therapeutic target to improve long-term outcomes for organ transplantation patients.
View details for DOI 10.1172/jci.insight.95659
View details for PubMedID 29367465
Unsupervised Analysis of Transcriptomics in Bacterial Sepsis Across Multiple Datasets Reveals Three Robust Clusters.
Critical care medicine
To find and validate generalizable sepsis subtypes using data-driven clustering.We used advanced informatics techniques to pool data from 14 bacterial sepsis transcriptomic datasets from eight different countries (n = 700).Retrospective analysis.Persons admitted to the hospital with bacterial sepsis.None.A unified clustering analysis across 14 discovery datasets revealed three subtypes, which, based on functional analysis, we termed "Inflammopathic, Adaptive, and Coagulopathic." We then validated these subtypes in nine independent datasets from five different countries (n = 600). In both discovery and validation data, the Adaptive subtype is associated with a lower clinical severity and lower mortality rate, and the Coagulopathic subtype is associated with higher mortality and clinical coagulopathy. Further, these clusters are statistically associated with clusters derived by others in independent single sepsis cohorts.The three sepsis subtypes may represent a unifying framework for understanding the molecular heterogeneity of the sepsis syndrome. Further study could potentially enable a precision medicine approach of matching novel immunomodulatory therapies with septic patients most likely to benefit.
View details for DOI 10.1097/CCM.0000000000003084
View details for PubMedID 29537985
An approach to infer putative disease-specific mechanisms using neighboring gene networks
2017; 33 (13): 1987–94
The ultimate goal of any experiment is to understand the biological phenomena underlying the condition investigated. This process often results in genes network through which a certain biological mechanism is explained. Such networks have been proven to be extremely useful, for the prediction of mechanisms of action of drugs or the responses of an organism to a specific impact (e.g. a disease, a treatment, etc.). Here, we introduce an approach able to build a network that captures the putative mechanisms at play in the given condition, by using datasets from multiple experiments studying the same phenotype. This method takes advantage of known interactions extracted from multiple sources such as protein-protein interactions and curated biological pathways. Based on such prior knowledge, we overcome the drawbacks of snap-shot data by considering the possible effects of each gene on its neighbors.We show the effectiveness of this approach in three different case studies and validate the results in two ways considering the identified genes and interactions between them. We compare our findings with the results of two widely-used methods in the same category as well as the classical approach of selecting differentially expressed (DE) genes in an investigated condition. The results show that 'neighbor-net' analysis is able to report biological mechanisms that are significantly relevant to the given diseases in all the three case studies, and performs better compared to all reference methods using both validation approaches.The proposed method is implemented as in R and will be available an a Bioconductor package.Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btx097
View details for Web of Science ID 000404054700011
View details for PubMedID 28200075
View details for PubMedCentralID PMC5870849
Identifying biologically relevant putative mechanisms in a given phenotype comparison
2017; 12 (5): e0176950
A major challenge in life science research is understanding the mechanism involved in a given phenotype. The ability to identify the correct mechanisms is needed in order to understand fundamental and very important phenomena such as mechanisms of disease, immune systems responses to various challenges, and mechanisms of drug action. The current data analysis methods focus on the identification of the differentially expressed (DE) genes using their fold change and/or p-values. Major shortcomings of this approach are that: i) it does not consider the interactions between genes; ii) its results are sensitive to the selection of the threshold(s) used, and iii) the set of genes produced by this approach is not always conducive to formulating mechanistic hypotheses. Here we present a method that can construct networks of genes that can be considered putative mechanisms. The putative mechanisms constructed by this approach are not limited to the set of DE genes, but also considers all known and relevant gene-gene interactions. We analyzed three real datasets for which both the causes of the phenotype, as well as the true mechanisms were known. We show that the method identified the correct mechanisms when applied on microarray datasets from mouse. We compared the results of our method with the results of the classical approach, showing that our method produces more meaningful biological insights.
View details for DOI 10.1371/journal.pone.0176950
View details for Web of Science ID 000401314000029
View details for PubMedID 28486531
View details for PubMedCentralID PMC5423614
Dysregulated Innate Immune Response Is a Robust Predictor of Allograft Injury and Survival Across All Transplanted Organs
WILEY. 2017: 406
View details for Web of Science ID 000404515702563
A Novel Pathway Analysis Approach Based on the Unexplained Disregulation of Genes
PROCEEDINGS OF THE IEEE
2017; 105 (3): 482–95
A crucial step in the understanding of any phenotype is the correct identification of the signaling pathways that are significantly impacted in that phenotype. However, most current pathway analysis methods produce both false positives as well as false negatives in certain circumstances. We hypothesized that such incorrect results are due to the fact that the existing methods fail to distinguish between the primary dis-regulation of a given gene itself and the effects of signaling coming from upstream. Furthermore, a modern whole-genome experiment performed with a next-generation technology spends a great deal of effort to measure the entire set of 30,000-100,000 transcripts in the genome. This is followed by the selection of a few hundreds differentially expressed genes, step that literally discards more than 99% of the collected data. We also hypothesized that such a drastic filtering could discard many genes that play crucial roles in the phenotype. We propose a novel topology-based pathway analysis method that identifies significantly impacted pathways using the entire set of measurements, thus allowing the full use of the data provided by NGS techniques. The results obtained on 24 real data sets involving 12 different human diseases, as well as on 8 yeast knock-out data sets show that the proposed method yields significant improvements with respect to the state-of-the-art methods: SPIA, GSEA and GSA.Primary dis-regulation analysis is implemented in R and included in ROntoTools Bioconductor package (versions ≥ 2.0.0). https://www.bioconductor.org/packages/release/bioc/html/ROntoTools.html.
View details for DOI 10.1109/JPROC.2016.2531000
View details for Web of Science ID 000395894900008
View details for PubMedID 30337764
View details for PubMedCentralID PMC6190577
- Pathway crosstalk effects: shrinkage and disentanglement using a Bayesian hierarchical model STATISTICS IN BIOSCIENCES 2016; 8 (2): 374–94
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters
2016; 11 (3): e0152333
Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward's minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward's and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository.
View details for DOI 10.1371/journal.pone.0152333
View details for Web of Science ID 000372708900053
View details for PubMedID 27015427
View details for PubMedCentralID PMC4807765
A novel bi-level meta-analysis approach: applied to biological pathway analysis
2016; 32 (3): 409–16
The accumulation of high-throughput data in public repositories creates a pressing need for integrative analysis of multiple datasets from independent experiments. However, study heterogeneity, study bias, outliers and the lack of power of available methods present real challenge in integrating genomic data. One practical drawback of many P-value-based meta-analysis methods, including Fisher's, Stouffer's, minP and maxP, is that they are sensitive to outliers. Another drawback is that, because they perform just one statistical test for each individual experiment, they may not fully exploit the potentially large number of samples within each study.We propose a novel bi-level meta-analysis approach that employs the additive method and the Central Limit Theorem within each individual experiment and also across multiple experiments. We prove that the bi-level framework is robust against bias, less sensitive to outliers than other methods, and more sensitive to small changes in signal. For comparative analysis, we demonstrate that the intra-experiment analysis has more power than the equivalent statistical test performed on a single large experiment. For pathway analysis, we compare the proposed framework versus classical meta-analysis approaches (Fisher's, Stouffer's and the additive method) as well as against a dedicated pathway meta-analysis package (MetaPath), using 1252 samples from 21 datasets related to three human diseases, acute myeloid leukemia (9 datasets), type II diabetes (5 datasets) and Alzheimer's disease (7 datasets). Our framework outperforms its competitors to correctly identify pathways relevant to the phenotypes. The framework is sufficiently general to be applied to any type of statistical meta-analysis.The R scripts are available on demand from the email@example.comSupplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btv588
View details for Web of Science ID 000370203000012
View details for PubMedID 26471455
View details for PubMedCentralID PMC5006307
Ontologies for Bioinformatics
SPRINGER HANDBOOK OF BIO-/NEUROINFORMATICS
View details for Web of Science ID 000401725400030
Assessing co-regulation of directly linked genes in biological networks using microarray time series analysis
2013; 114 (2): 149–54
Differential expression of genes detected with the analysis of high throughput genomic experiments is a commonly used intermediate step for the identification of signaling pathways involved in the response to different biological conditions. The impact analysis was the first approach for the analysis of signaling pathways involved in a certain biological process that was able to take into account not only the magnitude of the expression change of the genes but also the topology of signaling pathways including the type of each interactions between the genes. In the impact analysis, signaling pathways are represented as weighted directed graphs with genes as nodes and the interactions between genes as edges. Edges weights are represented by a β factor, the regulatory efficiency, which is assumed to be equal to 1 in inductive interactions between genes and equal to -1 in repressive interactions. This study presents a similarity analysis between gene expression time series aimed to find correspondences with the regulatory efficiency, i.e. the β factor as found in a widely used pathway database. Here, we focused on correlations among genes directly connected in signaling pathways, assuming that the expression variations of upstream genes impact immediately downstream genes in a short time interval and without significant influences by the interactions with other genes. Time series were processed using three different similarity metrics. The first metric is based on the bit string matching; the second one is a specific application of the Dynamic Time Warping to detect similarities even in presence of stretching and delays; the third one is a quantitative comparative analysis resulting by an evaluation of frequency domain representation of time series: the similarity metric is the correlation between dominant spectral components. These three approaches are tested on real data and pathways, and a comparison is performed using Information Retrieval benchmark tools, indicating the frequency approach as the best similarity metric among the three, for its ability to detect the correlation based on the correspondence of the most significant frequency components.
View details for DOI 10.1016/j.biosystems.2013.07.006
View details for Web of Science ID 000327106200005
View details for PubMedID 23876997
Analysis and correction of crosstalk effects in pathway analysis
2013; 23 (11): 1885–93
Identifying the pathways that are significantly impacted in a given condition is a crucial step in understanding the underlying biological phenomena. All approaches currently available for this purpose calculate a P-value that aims to quantify the significance of the involvement of each pathway in the given phenotype. These P-values were previously thought to be independent. Here we show that this is not the case, and that many pathways can considerably affect each other's P-values through a "crosstalk" phenomenon. Although it is intuitive that various pathways could influence each other, the presence and extent of this phenomenon have not been rigorously studied and, most importantly, there is no currently available technique able to quantify the amount of such crosstalk. Here, we show that all three major categories of pathway analysis methods (enrichment analysis, functional class scoring, and topology-based methods) are severely influenced by crosstalk phenomena. Using real pathways and data, we show that in some cases pathways with significant P-values are not biologically meaningful, and that some biologically meaningful pathways with nonsignificant P-values become statistically significant when the crosstalk effects of other pathways are removed. We describe a technique able to detect, quantify, and correct crosstalk effects, as well as identify independent functional modules. We assessed this novel approach on data from four experiments involving three phenotypes and two species. This method is expected to allow a better understanding of individual experiment results, as well as a more refined definition of the existing signaling pathways for specific phenotypes.
View details for DOI 10.1101/gr.153551.112
View details for Web of Science ID 000326642500012
View details for PubMedID 23934932
View details for PubMedCentralID PMC3814888
Novel gene-bases and pathway analysis GWAS methods for the identification of new candidate genes and pathways for type 2 diabetes
SPRINGER. 2013: S157
View details for Web of Science ID 000329196901029
Methods and approaches in the topology-based analysis of biological pathways
FRONTIERS IN PHYSIOLOGY
2013; 4: 278
The goal of pathway analysis is to identify the pathways significantly impacted in a given phenotype. Many current methods are based on algorithms that consider pathways as simple gene lists, dramatically under-utilizing the knowledge that such pathways are meant to capture. During the past few years, a plethora of methods claiming to incorporate various aspects of the pathway topology have been proposed. These topology-based methods, sometimes referred to as "third generation," have the potential to better model the phenomena described by pathways. Although there is now a large variety of approaches used for this purpose, no review is currently available to offer guidance for potential users and developers. This review covers 22 such topology-based pathway analysis methods published in the last decade. We compare these methods based on: type of pathways analyzed (e.g., signaling or metabolic), input (subset of genes, all genes, fold changes, gene p-values, etc.), mathematical models, pathway scoring approaches, output (one or more pathway scores, p-values, etc.) and implementation (web-based, standalone, etc.). We identify and discuss challenges, arising both in methodology and in pathway representation, including inconsistent terminology, different data formats, lack of meaningful benchmarks, and the lack of tissue and condition specificity.
View details for DOI 10.3389/fphys.2013.00278
View details for Web of Science ID 000346774000275
View details for PubMedID 24133454
View details for PubMedCentralID PMC3794382
A Genetic Algorithms Framework for Estimating Individual Gene Contributions in Signaling Pathways
IEEE. 2013: 650–57
View details for Web of Science ID 000326235300084
- Incorporating gene significance in the impact analysis of signaling pathways IEEE. 2012: 126–31
A method for analysis and correction of cross-talk effects in pathway analysis
View details for Web of Science ID 000309341302013
The Biological Connection Markup Language: a SBGN-compliant format for visualization, filtering and analysis of biological pathways
2011; 27 (15): 2127–33
Many models and analysis of signaling pathways have been proposed. However, neither of them takes into account that a biological pathway is not a fixed system, but instead it depends on the organism, tissue and cell type as well as on physiological, pathological and experimental conditions.The Biological Connection Markup Language (BCML) is a format to describe, annotate and visualize pathways. BCML is able to store multiple information, permitting a selective view of the pathway as it exists and/or behave in specific organisms, tissues and cells. Furthermore, BCML can be automatically converted into data formats suitable for analysis and into a fully SBGN-compliant graphical representation, making it an important tool that can be used by both computational biologists and 'wet lab' scientists.The XML schema and the BCML software suite are freely available under the LGPL for download at http://bcml.dc-atlas.net. They are implemented in Java and supported on MS Windows, Linux and OS X.
View details for DOI 10.1093/bioinformatics/btr339
View details for Web of Science ID 000292778700015
View details for PubMedID 21653523
View details for PubMedCentralID PMC3137220
Signaling Pathways Coupling Phenomena
View details for Web of Science ID 000287421403120