Honors & Awards
Clinical Data Science Fellow, National Library of Medicine - National Institutes of Health (9/2019 - 9/2020)
Cancer Systems Biology Scholars Fellow, National Institutes of Health - National Cancer Institute (8/2017 - 8/2019)
Shahram Shirani, Roozbeh Dehghannasiri. "United States Patent 9,294,711 De-interlacing and frame rate upconversion for high definition video", McMaster University, Mar 22, 2016
The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans.
Science (New York, N.Y.)
2022; 376 (6594): eabl4896
Molecular characterization of cell types using single-cell transcriptome sequencing is revolutionizing cell biology and enabling new insights into the physiology of human organs. We created a human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor. This atlas enabled molecular characterization of more than 400 cell types, their distribution across tissues, and tissue-specific variation in gene expression. Using multiple tissues from a single donor enabled identification of the clonal distribution of T cells between tissues, identification of the tissue-specific mutation rate in B cells, and analysis of the cell cycle state and proliferative potential of shared cell types across tissues. Cell type-specific RNA splicing was discovered and analyzed across tissues within an individual.
View details for DOI 10.1126/science.abl4896
View details for PubMedID 35549404
The SpliZ generalizes 'percent spliced in' to reveal regulated splicing at single-cell resolution.
Detecting single-cell-regulated splicing from droplet-based technologies is challenging. Here, we introduce the splicing Z score (SpliZ), an annotation-free statistical method to detect regulated splicing in single-cell RNA sequencing. We applied the SpliZ to human lung cells, discovering hundreds of genes with cell-type-specific splicing patterns including ones with potential implications for basic and translational biology.
View details for DOI 10.1038/s41592-022-01400-x
View details for PubMedID 35241832
RNA splicing programs define tissue compartments and cell types at single cell resolution.
The extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach, to detect cell-type-specific splicing in >110K cells from 12 human tissues. Using 10x data for discovery, 9.1% of genes with computable SpliZ scores are cell-type-specifically spliced, including ubiquitously expressed genes MYL6 and RPS24. These results are validated with RNA FISH, single-cell PCR, and Smart-seq2. SpliZ analysis reveals 170 genes with regulated splicing during human spermatogenesis, including examples conserved in mouse and mouse lemur. The SpliZ allows model-based identification of subpopulations indistinguishable based on gene expression, illustrated by subpopulation-specific splicing of classical monocytes involving an ultraconserved exon in SAT1. Together, this analysis of differential splicing across multiple organs establishes that splicing is regulated cell-type-specifically.
View details for DOI 10.7554/eLife.70692
View details for PubMedID 34515025
Specific splice junction detection in single cells with SICILIAN.
2021; 22 (1): 219
Precise splice junction calls are currently unavailable in scRNA-seq pipelines such as the 10x Chromium platform but are critical for understanding single-cell biology. Here, we introduce SICILIAN, a new method that assigns statistical confidence to splice junctions from a spliced aligner to improve precision. SICILIAN is a general method that can be applied to bulk or single-cell data, but has particular utility for single-cell analysis due to that data's unique challenges and opportunities for discovery. SICILIAN's precise splice detection achieves high accuracy on simulated data, improves concordance between matched single-cell and bulk datasets, and increases agreement between biological replicates. SICILIAN detects unannotated splicing in single cells, enabling the discovery of novel splicing regulation through single-cell analysis workflows.
View details for DOI 10.1186/s13059-021-02434-8
View details for PubMedID 34353340
Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers.
Proceedings of the National Academy of Sciences of the United States of America
The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce Data-Enriched Efficient PrEcise STatistical fusion detection (DEEPEST), an algorithm that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling 10-fold fewer false-positive fusions in nontransformed human tissues. We leverage the increased precision of DEEPEST to discover fundamental cancer biology. Namely, 888 candidate oncogenes are identified based on overrepresentation in DEEPEST calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs, demonstrating a previously unappreciated prevalence and potential for function. DEEPEST also reveals a high enrichment for fusions involving oncogenes in cancers, including ovarian cancer, which has had minimal treatment advances in recent decades, finding that more than 50% of tumors harbor gene fusions predicted to be oncogenic. Specific protein domains are enriched in DEEPEST calls, indicating a global selection for fusion functionality: kinase domains are nearly 2-fold more enriched in DEEPEST calls than expected by chance, as are domains involved in (anaerobic) metabolism and DNA binding. The statistical algorithms, population-level analytic framework, and the biological conclusions of DEEPEST call for increased attention to gene fusions as drivers of cancer and for future research into using fusions for targeted therapy.
View details for DOI 10.1073/pnas.1900391116
View details for PubMedID 31308241
Ambiguous splice sites distinguish circRNA and linear splicing in the human genome
2019; 35 (8): 1263–68
View details for DOI 10.1093/bioinformatics/bty785
View details for Web of Science ID 000473691900001
TGS1 impacts snRNA 3'-end processing, ameliorates survival motor neuron-dependent neurological phenotypes in vivo and prevents neurodegeneration.
Nucleic acids research
Trimethylguanosine synthase 1 (TGS1) is a highly conserved enzyme that converts the 5'-monomethylguanosine cap of small nuclear RNAs (snRNAs) to a trimethylguanosine cap. Here, we show that loss of TGS1 in Caenorhabditis elegans, Drosophila melanogaster and Danio rerio results in neurological phenotypes similar to those caused by survival motor neuron (SMN) deficiency. Importantly, expression of human TGS1 ameliorates the SMN-dependent neurological phenotypes in both flies and worms, revealing that TGS1 can partly counteract the effects of SMN deficiency. TGS1 loss in HeLa cells leads to the accumulation of immature U2 and U4atac snRNAs with long 3' tails that are often uridylated. snRNAs with defective 3' terminations also accumulate in Drosophila Tgs1 mutants. Consistent with defective snRNA maturation, TGS1 and SMN mutant cells also exhibit partially overlapping transcriptome alterations that include aberrantly spliced and readthrough transcripts. Together, these results identify a neuroprotective function for TGS1 and reinforce the view that defective snRNA maturation affects neuronal viability and function.
View details for DOI 10.1093/nar/gkac659
View details for PubMedID 35947650
Cell types of origin of the cell-free transcriptome.
Cell-free RNA from liquid biopsies can be analyzed to determine disease tissue of origin. We extend this concept to identify cell types of origin using the Tabula Sapiens transcriptomic cell atlas as well as individual tissue transcriptomic cell atlases in combination with the Human Protein Atlas RNA consensus dataset. We define cell type signature scores, which allow the inference of cell types that contribute to cell-free RNA for a variety of diseases.
View details for DOI 10.1038/s41587-021-01188-9
View details for PubMedID 35132263
ReadZS detects cell type-specific and developmentally regulated RNA processing programs in single-cell RNA-seq.
2022; 23 (1): 226
RNA processing, including splicing and alternative polyadenylation, is crucial to gene function and regulation, but methods to detect RNA processing from single-cell RNA sequencing data are limited by reliance on pre-existing annotations, peak calling heuristics, and collapsing measurements by cell type. We introduce ReadZS, an annotation-free statistical approach to identify regulated RNA processing in single cells. ReadZS discovers cell type-specific RNA processing in human lung and conserved, developmentally regulated RNA processing in mammalian spermatogenesis-including global 3' UTR shortening in human spermatogenesis. ReadZS also discovers global 3' UTR lengthening in Arabidopsis development, highlighting the usefulness of this method in under-annotated transcriptomes.
View details for DOI 10.1186/s13059-022-02795-8
View details for PubMedID 36284317
SICILIAN: Precise and unbiased detection of gene fusions at the resolution of single cells using improved statistical modeling
AMER ASSOC CANCER RESEARCH. 2020
View details for DOI 10.1158/1538-7445.AM2020-3378
View details for Web of Science ID 000590059301067
Towards precise and cost-effective fusion discovery: A landscape of druggable gene fusions across TCGA cancers
AMER ASSOC CANCER RESEARCH. 2019
View details for DOI 10.1158/1538-7445.AM2019-2468
View details for Web of Science ID 000488279400427
An experimental design framework for Markovian gene regulatory networks under stationary control policy.
BMC systems biology
2018; 12 (Suppl 8): 137
BACKGROUND: A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty.RESULTS: In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an "intrinsically Bayesian robust" (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) - but only in experimental design, the final policy being an IBR policy.CONCLUSIONS: Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.
View details for PubMedID 30577732
A Bayesian robust Kalman smoothing framework for state-space models with uncertain noise statistics
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING
View details for DOI 10.1186/s13634-018-0577-1
View details for Web of Science ID 000443927700001
Sequential Experimental Design for Optimal Structural Intervention in Gene Regulatory Networks Based on the Mean Objective Cost of Uncertainty
2018; 17: 1176935118790247
Scientists are attempting to use models of ever-increasing complexity, especially in medicine, where gene-based diseases such as cancer require better modeling of cell regulation. Complex models suffer from uncertainty and experiments are needed to reduce this uncertainty. Because experiments can be costly and time-consuming, it is desirable to determine experiments providing the most useful information. If a sequence of experiments is to be performed, experimental design is needed to determine the order. A classical approach is to maximally reduce the overall uncertainty in the model, meaning maximal entropy reduction. A recently proposed method takes into account both model uncertainty and the translational objective, for instance, optimal structural intervention in gene regulatory networks, where the aim is to alter the regulatory logic to maximally reduce the long-run likelihood of being in a cancerous state. The mean objective cost of uncertainty (MOCU) quantifies uncertainty based on the degree to which model uncertainty affects the objective. Experimental design involves choosing the experiment that yields the greatest reduction in MOCU. This article introduces finite-horizon dynamic programming for MOCU-based sequential experimental design and compares it with the greedy approach, which selects one experiment at a time without consideration of the full horizon of experiments. A salient aspect of the article is that it demonstrates the advantage of MOCU-based design over the widely used entropy-based design for both greedy and dynamic programming strategies and investigates the effect of model conditions on the comparative performances.
View details for PubMedID 30093796
Optimal Bayesian Kalman Filtering With Prior Update
IEEE TRANSACTIONS ON SIGNAL PROCESSING
2018; 66 (8): 1982–96
View details for DOI 10.1109/TSP.2017.2788419
View details for Web of Science ID 000426694700004
Intrinsically Bayesian robust Karhunen-Loève compression
2018; 144: 311-322
View details for DOI 10.1016/j.sigpro.2017.10.016
Intrinsically Bayesian Robust Kalman Filter: An Innovation Process Approach
IEEE TRANSACTIONS ON SIGNAL PROCESSING
2017; 65 (10): 2531-2546
View details for DOI 10.1109/TSP.2017.2656845
View details for Web of Science ID 000398670800005
Optimal experimental design for materials discovery
COMPUTATIONAL MATERIALS SCIENCE
2017; 129: 311-322
View details for DOI 10.1016/j.commatsci.2016.11.041
View details for Web of Science ID 000394065000038
Optimal experimental design in the context of canonical expansions
IET Signal Processing
2017; 11 (8): 942-951
View details for DOI 10.1049/iet-spr.2017.0016
Optimal Objective-Based Experimental Design for Uncertain Dynamical Gene Networks with Experimental Error.
IEEE/ACM transactions on computational biology and bioinformatics
In systems biology, network models are often used to study interactions among cellular components, a salient aim being to develop drugs and therapeutic mechanisms to change the dynamical behavior of the network to avoid undesirable phenotypes. Owing to limited knowledge, model uncertainty is commonplace and network dynamics can be updated in different ways, thereby giving multiple dynamic trajectories, that is, dynamics uncertainty. In this manuscript, we propose an experimental design method that can effectively reduce the dynamics uncertainty and improve performance in an interaction-based network. Both dynamics uncertainty and experimental error are quantified with respect to the modeling objective, herein, therapeutic intervention. The aim of experimental design is to select among a set of candidate experiments the experiment whose outcome, when applied to the network model, maximally reduces the dynamics uncertainty pertinent to the intervention objective.
View details for PubMedID 27576263
Efficient experimental design for uncertainty reduction in gene regulatory networks
An accurate understanding of interactions among genes plays a major role in developing therapeutic intervention methods. Gene regulatory networks often contain a significant amount of uncertainty. The process of prioritizing biological experiments to reduce the uncertainty of gene regulatory networks is called experimental design. Under such a strategy, the experiments with high priority are suggested to be conducted first.The authors have already proposed an optimal experimental design method based upon the objective for modeling gene regulatory networks, such as deriving therapeutic interventions. The experimental design method utilizes the concept of mean objective cost of uncertainty (MOCU). MOCU quantifies the expected increase of cost resulting from uncertainty. The optimal experiment to be conducted first is the one which leads to the minimum expected remaining MOCU subsequent to the experiment. In the process, one must find the optimal intervention for every gene regulatory network compatible with the prior knowledge, which can be prohibitively expensive when the size of the network is large. In this paper, we propose a computationally efficient experimental design method. This method incorporates a network reduction scheme by introducing a novel cost function that takes into account the disruption in the ranking of potential experiments. We then estimate the approximate expected remaining MOCU at a lower computational cost using the reduced networks.Simulation results based on synthetic and real gene regulatory networks show that the proposed approximate method has close performance to that of the optimal method but at lower computational cost. The proposed approximate method also outperforms the random selection policy significantly. A MATLAB software implementing the proposed experimental design method is available at http://gsp.tamu.edu/Publications/supplementary/roozbeh15a/.
View details for DOI 10.1186/1471-2105-16-S13-S2
View details for Web of Science ID 000367879400002
View details for PubMedID 26423515
View details for PubMedCentralID PMC4597030
Optimal Experimental Design for Gene Regulatory Networks in the Presence of Uncertainty
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
2015; 12 (4): 938-950
Of major interest to translational genomics is the intervention in gene regulatory networks (GRNs) to affect cell behavior; in particular, to alter pathological phenotypes. Owing to the complexity of GRNs, accurate network inference is practically challenging and GRN models often contain considerable amounts of uncertainty. Considering the cost and time required for conducting biological experiments, it is desirable to have a systematic method for prioritizing potential experiments so that an experiment can be chosen to optimally reduce network uncertainty. Moreover, from a translational perspective it is crucial that GRN uncertainty be quantified and reduced in a manner that pertains to the operational cost that it induces, such as the cost of network intervention. In this work, we utilize the concept of mean objective cost of uncertainty (MOCU) to propose a novel framework for optimal experimental design. In the proposed framework, potential experiments are prioritized based on the MOCU expected to remain after conducting the experiment. Based on this prioritization, one can select an optimal experiment with the largest potential to reduce the pertinent uncertainty present in the current network model. We demonstrate the effectiveness of the proposed method via extensive simulations based on synthetic and real gene regulatory networks.
View details for DOI 10.1109/TCBB.2014.2377733
View details for Web of Science ID 000359264900027
View details for PubMedID 26357334