Soumya Kundu
Ph.D. Student in Computer Science, admitted Autumn 2018
Education & Certifications
-
Master of Science, University of Connecticut, Computer Science and Engineering (2018)
-
Bachelor of Science, University of Connecticut, Computer Science and Engineering (2018)
All Publications
-
Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders
Cell
2024; Published online September 30, 2024
View details for DOI 10.1016/j.cell.2024.09.014
-
Single-nucleus chromatin accessibility profiling highlights regulatory mechanisms of coronary artery disease risk.
Nature genetics
2022
Abstract
Coronary artery disease (CAD) is a complex inflammatory disease involving genetic influences across cell types. Genome-wide association studies have identified over 200 loci associated with CAD, where the majority of risk variants reside in noncoding DNA sequences impacting cis-regulatory elements. Here, we applied single-nucleus assay for transposase-accessible chromatin with sequencing to profile 28,316 nuclei across coronary artery segments from 41 patients with varying stages of CAD, which revealed 14 distinct cellular clusters. We mapped ~320,000 accessible sites across all cells, identified cell-type-specific elements and transcription factors, and prioritized functional CAD risk variants. We identified elements in smooth muscle cell transition states (for example, fibromyocytes) and functional variants predicted to alter smooth muscle cell- and macrophage-specific regulation of MRAS (3q22) and LIPA (10q23), respectively. We further nominated key driver transcription factors such as PRDM16 and TBX2. Together, this single-nucleus atlas provides a critical step towards interpreting regulatory mechanisms across the continuum of CAD risk.
View details for DOI 10.1038/s41588-022-01069-0
View details for PubMedID 35590109
-
Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer's and Parkinson's diseases.
Nature genetics
2020
Abstract
Genome-wide association studies of neurological diseases have identified thousands of variants associated with disease phenotypes. However, most of these variants do not alter coding sequences, making it difficult to assign their function. Here, we present a multi-omic epigenetic atlas of the adult human brain through profiling of single-cell chromatin accessibility landscapes and three-dimensional chromatin interactions of diverse adult brain regions across a cohort of cognitively healthy individuals. We developed a machine-learning classifier to integrate this multi-omic framework and predict dozens of functional SNPs for Alzheimer's and Parkinson's diseases, nominating target genes and cell types for previously orphaned loci from genome-wide association studies. Moreover, we dissected the complex inverted haplotype of the MAPT (encoding tau) Parkinson's disease risk locus, identifying putative ectopic regulatory interactions in neurons that may mediate this disease association. This work expands understanding of inherited variation and provides a roadmap for the epigenomic dissection of causal regulatory variation in disease.
View details for DOI 10.1038/s41588-020-00721-x
View details for PubMedID 33106633
-
Assessing the accuracy of phylogenetic rooting methods on prokaryotic gene families
PLOS ONE
2020; 15 (5): e0232950
Abstract
Almost all standard phylogenetic methods for reconstructing gene trees result in unrooted trees; yet, many of the most useful applications of gene trees require that the gene trees be correctly rooted. As a result, several computational methods have been developed for inferring the root of unrooted gene trees. However, the accuracy of such methods has never been systematically evaluated on prokaryotic gene families, where horizontal gene transfer is often one of the dominant evolutionary events driving gene family evolution. In this work, we address this gap by conducting a thorough comparative evaluation of five different rooting methods using large collections of both simulated and empirical prokaryotic gene trees. Our simulation study is based on 6000 true and reconstructed gene trees on 100 species and characterizes the rooting accuracy of the four methods under 36 different evolutionary conditions and 3 levels of gene tree reconstruction error. The empirical study is based on a large, carefully designed data set of 3098 gene trees from 504 bacterial species (406 Alphaproteobacteria and 98 Cyanobacteria) and reveals insights that supplement those gleaned from the simulation study. Overall, this work provides several valuable insights into the accuracy of the considered methods that will help inform the choice of rooting methods to use when studying microbial gene family evolution. Among other findings, this study identifies parsimonious Duplication-Transfer-Loss (DTL) rooting and Minimal Ancestor Deviation (MAD) rooting as two of the most accurate gene tree rooting methods for prokaryotes and specifies the evolutionary conditions under which these methods are most accurate, demonstrates that DTL rooting is highly sensitive to high evolutionary rates and gene tree error, and that rooting methods based on branch-lengths are generally robust to gene tree reconstruction error.
View details for DOI 10.1371/journal.pone.0232950
View details for Web of Science ID 000537496000029
View details for PubMedID 32413061
View details for PubMedCentralID PMC7228096
-
SaGePhy: An improved phylogenetic simulation framework for gene and subgene evolution.
Bioinformatics (Oxford, England)
2019
Abstract
SaGePhy is a software package for improved phylogenetic simulation of gene and subgene evolution. SaGePhy can be used to generate species trees, gene trees, and subgene or (protein) domain trees using a probabilistic birth-death process that allows for gene and subgene duplication, horizontal gene and subgene transfer, and gene and subgene loss. SaGePhy implements a range of important features not found in other phylogenetic simulation frameworks/software. These include (i) simulation of subgene or domain level evolution inside one or more gene trees, (ii) simultaneous simulation of both additive and replacing horizontal gene/subgene transfers, and (iii) probabilistic sampling of species tree and gene tree nodes, respectively, for gene-family and domain-family birth. SaGePhy is open-source, platform independent, and written in Java and Python.Executables, source code (open-source under the revised BSD licence), and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/sagephy/.
View details for DOI 10.1093/bioinformatics/btz081
View details for PubMedID 30715213
-
RANGER-DTL 2.0: rigorous reconstruction of gene-family evolution by duplication, transfer and loss
BIOINFORMATICS
2018; 34 (18): 3214–16
Abstract
RANGER-DTL 2.0 is a software program for inferring gene family evolution using Duplication-Transfer-Loss reconciliation. This new software is highly scalable and easy to use, and offers many new features not currently available in any other reconciliation program. RANGER-DTL 2.0 has a particular focus on reconciliation accuracy and can account for many sources of reconciliation uncertainty including uncertain gene tree rooting, gene tree topological uncertainty, multiple optimal reconciliations and alternative event cost assignments. RANGER-DTL 2.0 is open-source and written in C++ and Python.Pre-compiled executables, source code (open-source under GNU GPL) and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/RANGER-DTL/.Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/bty314
View details for Web of Science ID 000446433800022
View details for PubMedID 29688310
View details for PubMedCentralID PMC6137995
-
On the impact of uncertain gene tree rooting on duplication-transfer-loss reconciliation
BMC. 2018: 290
Abstract
Duplication-Transfer-Loss (DTL) reconciliation is a powerful and increasingly popular technique for studying the evolution of microbial gene families. DTL reconciliation requires the use of rooted gene trees to perform the reconciliation with the species tree, and the standard technique for rooting gene trees is to assign a root that results in the minimum reconciliation cost across all rootings of that gene tree. However, even though it is well understood that many gene trees have multiple optimal roots, only a single optimal root is randomly chosen to create the rooted gene tree and perform the reconciliation. This remains an important overlooked and unaddressed problem in DTL reconciliation, leading to incorrect evolutionary inferences. In this work, we perform an in-depth analysis of the impact of uncertain gene tree rooting on the computed DTL reconciliation and provide the first computational tools to quantify and negate the impact of gene tree rooting uncertainty on DTL reconciliation.Our analysis of a large data set of over 4500 gene families from 100 species shows that a large fraction of gene trees have multiple optimal rootings, that these multiple roots often, but not always, appear closely clustered together in the same region of the gene tree, that many aspects of the reconciliation remain conserved across the multiple rootings, that gene tree error has a profound impact on the prevalence and structure of multiple optimal rootings, and that there are specific interesting patterns in the reconciliation of those gene trees that have multiple optimal roots.Our results show that unrooted gene trees can be meaningfully reconciled and high-quality evolutionary information can be obtained from them even after accounting for multiple optimal rootings. In addition, the techniques and tools introduced in this paper make it possible to systematically avoid incorrect evolutionary inferences caused by incorrect or uncertain gene tree rooting. These tools have been implemented in the phylogenetic reconciliation software package RANGER-DTL 2.0, freely available from http://compbio.engr.uconn.edu/software/RANGER-DTL/ .
View details for DOI 10.1186/s12859-018-2269-0
View details for Web of Science ID 000442105800011
View details for PubMedID 30367593