Honors & Awards
First Prize in 10th IDRBT Doctoral Colloquium, IDRBT, Hyderabad, India (December, 2020)
Best Paper Award in IEEE CINE Conference, KIIT, Odisha, India (October, 2017)
Ph. D Dissertation Fellowship, Indian Statistical Institute (July, 2015)
University Gold Medal, Jadavpur University, Kolkata, India (December, 2015)
Doctor of Philosophy, Indian Statistical Institute (2022)
Master of Technology, Jadavpur University (2015)
Bachelor of Engineering, University Of Burdwan (2012)
Summer Han, Postdoctoral Faculty Sponsor
Approximate Graph Laplacians for Multimodal Data Clustering
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
2021; 43 (3): 798-813
One of the important approaches of handling data heterogeneity in multimodal data clustering is modeling each modality using a separate similarity graph. Information from the multiple graphs is integrated by combining them into a unified graph. A major challenge here is how to preserve cluster information while removing noise from individual graphs. In this regard, a novel algorithm, termed as CoALa, is proposed that integrates noise-free approximations of multiple similarity graphs. The proposed method first approximates a graph using the most informative eigenpairs of its Laplacian which contain cluster information. The approximate Laplacians are then integrated for the construction of a low-rank subspace that best preserves overall cluster information of multiple graphs. However, this approximate subspace differs from the full-rank subspace which integrates information from all the eigenpairs of each Laplacian. Matrix perturbation theory is used to theoretically evaluate how far approximate subspace deviates from the full-rank one for a given value of approximation rank. Finally, spectral clustering is performed on the approximate subspace to identify the clusters. Experimental results on several real-life cancer and benchmark data sets demonstrate that the proposed algorithm significantly and consistently outperforms state-of-the-art integrative clustering approaches.
View details for DOI 10.1109/TPAMI.2019.2945574
View details for Web of Science ID 000616309900004
View details for PubMedID 31603770
Multi-Manifold Optimization for Multi-View Subspace Clustering
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
The meaningful patterns embedded in high-dimensional multi-view data sets typically tend to have a much more compact representation that often lies close to a low-dimensional manifold. Identification of hidden structures in such data mainly depends on the proper modeling of the geometry of low-dimensional manifolds. In this regard, this article presents a manifold optimization-based integrative clustering algorithm for multi-view data. To identify consensus clusters, the algorithm constructs a joint graph Laplacian that contains denoised cluster information of the individual views. It optimizes a joint clustering objective while reducing the disagreement between the cluster structures conveyed by the joint and individual views. The optimization is performed alternatively over k-means and Stiefel manifolds. The Stiefel manifold helps to model the nonlinearities and differential clusters within the individual views, whereas k-means manifold tries to elucidate the best-fit joint cluster structure of the data. A gradient-based movement is performed separately on the manifold of each view so that individual nonlinearity is preserved while looking for shared cluster information. The convergence of the proposed algorithm is established over the manifold and asymptotic convergence bound is obtained to quantify theoretically how fast the sequence of iterates generated by the algorithm converges to an optimal solution. The integrative clustering on benchmark and multi-omics cancer data sets demonstrates that the proposed algorithm outperforms state-of-the-art multi-view clustering approaches.
View details for DOI 10.1109/TNNLS.2021.3054789
View details for Web of Science ID 000732924000001
View details for PubMedID 33606638
Selective Update of Relevant Eigenspaces for Integrative Clustering of Multimodal Data.
IEEE transactions on cybernetics
One of the major problems in cancer subtype discovery from multimodal omic data is that all the available modalities may not encode relevant and homogeneous information about the subtypes. Moreover, the high-dimensional nature of the modalities makes sample clustering computationally expensive. In this regard, a novel algorithm is proposed to extract a low-rank joint subspace of the integrated data matrix. The proposed algorithm first evaluates the quality of subtype information provided by each of the modalities, and then judiciously selects only relevant ones to construct the joint subspace. The problem of incrementally updating the singular value decomposition of a data matrix is formulated for the multimodal data framework. The analytical formulation enables efficient construction of the joint subspace of integrated data from low-rank subspaces of the individual modalities. The construction of joint subspace by the proposed method is shown to be computationally more efficient compared to performing the principal component analysis (PCA) on the integrated data matrix. Some new quantitative indices are introduced to measure theoretically the accuracy of subspace construction by the proposed approach with respect to the principal subspace extracted by the PCA. The efficacy of clustering on the joint subspace constructed by the proposed algorithm is established over existing integrative clustering approaches on several real-life multimodal cancer data sets.
View details for DOI 10.1109/TCYB.2020.2990112
View details for PubMedID 32452799
Low-Rank Joint Subspace Construction for Cancer Subtype Discovery
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
2020; 17 (4): 1290-1302
Multimodal data integration is an important framework for cancer subtype discovery as it can blend the inherent properties of individual modalities with their cross-platform correlations to infer clinically relevant subtypes. The main problem here is the appropriate selection of relevant and complementary modalities. Another problem is the 'high dimension-low sample size' nature of each modality. The current research work proposes a novel algorithm to construct a low-rank joint subspace from the low-rank subspaces of individual high-dimensional modalities. Statistical hypothesis testing is introduced to effectively estimate the rank of each modality by separating the signal component from its noise counterpart. Two quantitative indices are proposed to evaluate the quality of different modalities, the first one assesses the degree of relevance of the cluster structure embedded within each modality, while the second measure evaluates the amount of cluster information shared between two modalities. To construct the joint subspace, the algorithm selects the most relevant modalities with maximum shared information. During data integration, the intersection between two subspaces is also considered to select cluster information and filter out the noise from different subspaces. The efficacy of clustering on the joint subspace, extracted by the proposed algorithm, is compared with that of several existing integrative clustering approaches on real-life multimodal data sets. Experimental results show that the identified subtypes have closer resemblance with the clinically established subtypes as compared to the subtypes identified by the existing approaches. Survival analysis has revealed the significant differences between survival profiles of the identified subtypes, while robustness analysis shows that the identified subtypes are not sensitive towards perturbation of the data sets.
View details for DOI 10.1109/TCBB.2019.2894635
View details for Web of Science ID 000556777900018
View details for PubMedID 30676972
Principal Subspace Updation for Integrative Clustering of Multimodal Omics Data
International Conference on Computational Intelligence and Networks (CINE)
IEEE. 2017: 99-104
View details for DOI 10.1109/CINE.2017.14