Honors & Awards
T32 Training Program Fellowship, Stanford Medicine (Department of Radiation Oncology) (09/2018-)
Cancer Systems Biology Program Fellowship (NIH-R25), Stanford University (09/2015-08/2017)
NCI Speaker/Travel Award, NCI (Systems Analysis of Cancer Biology conference) (04/2016)
Doctor of Philosophy, Texas A&M University College Station (2014)
Master of Science, Sharif University of Technology (2009)
Bachelor of Science, University Of Tehran (2007)
- Integrating genomic features for non-invasive early lung cancer detection NATURE 2020
- Distinct Chromatin Accessibility Profiles of Lymphoma Subtypes Revealed By Targeted Cell Free DNA Profiling AMER SOC HEMATOLOGY. 2018
Distinct biological subtypes and patterns of genome evolution in lymphoma revealed by circulating tumor DNA
SCIENCE TRANSLATIONAL MEDICINE
2016; 8 (364)
Patients with diffuse large B cell lymphoma (DLBCL) exhibit marked diversity in tumor behavior and outcomes, yet the identification of poor-risk groups remains challenging. In addition, the biology underlying these differences is incompletely understood. We hypothesized that characterization of mutational heterogeneity and genomic evolution using circulating tumor DNA (ctDNA) profiling could reveal molecular determinants of adverse outcomes. To address this hypothesis, we applied cancer personalized profiling by deep sequencing (CAPP-Seq) analysis to tumor biopsies and cell-free DNA samples from 92 lymphoma patients and 24 healthy subjects. At diagnosis, the amount of ctDNA was found to strongly correlate with clinical indices and was independently predictive of patient outcomes. We demonstrate that ctDNA genotyping can classify transcriptionally defined tumor subtypes, including DLBCL cell of origin, directly from plasma. By simultaneously tracking multiple somatic mutations in ctDNA, our approach outperformed immunoglobulin sequencing and radiographic imaging for the detection of minimal residual disease and facilitated noninvasive identification of emergent resistance mutations to targeted therapies. In addition, we identified distinct patterns of clonal evolution distinguishing indolent follicular lymphomas from those that transformed into DLBCL, allowing for potential noninvasive prediction of histological transformation. Collectively, our results demonstrate that ctDNA analysis reveals biological factors that underlie lymphoma clinical outcomes and could facilitate individualized therapy.
View details for DOI 10.1126/scitranslmed.aai8545
View details for PubMedID 27831904
Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients
Circulating tumour DNA (ctDNA) analysis facilitates studies of tumour heterogeneity. Here we employ CAPP-Seq ctDNA analysis to study resistance mechanisms in 43 non-small cell lung cancer (NSCLC) patients treated with the third-generation epidermal growth factor receptor (EGFR) inhibitor rociletinib. We observe multiple resistance mechanisms in 46% of patients after treatment with first-line inhibitors, indicating frequent intra-patient heterogeneity. Rociletinib resistance recurrently involves MET, EGFR, PIK3CA, ERRB2, KRAS and RB1. We describe a novel EGFR L798I mutation and find that EGFR C797S, which arises in ∼33% of patients after osimertinib treatment, occurs in <3% after rociletinib. Increased MET copy number is the most frequent rociletinib resistance mechanism in this cohort and patients with multiple pre-existing mechanisms (T790M and MET) experience inferior responses. Similarly, rociletinib-resistant xenografts develop MET amplification that can be overcome with the MET inhibitor crizotinib. These results underscore the importance of tumour heterogeneity in NSCLC and the utility of ctDNA-based resistance mechanism assessment.
View details for DOI 10.1038/ncomms11815
View details for PubMedID 27283993
Effect of separate sampling on classification accuracy.
2014; 30 (2): 242-250
Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples.We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier.All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b.
View details for DOI 10.1093/bioinformatics/btt662
View details for PubMedID 24257187
Broad Genomic Profiling of Bronchoalveolar Lavage Fluid in Lung Cancer
ELSEVIER SCIENCE INC. 2019: S747–S748
View details for Web of Science ID 000492162204084
- Validated Limited Gene Predictor For Cervical Cancer Lymph Node Metastases ELSEVIER SCIENCE INC. 2019: S50
- Determining cell type abundance and expression from bulk tissues with digital cytometry NATURE BIOTECHNOLOGY 2019; 37 (7): 773-+
- Detection and Surveillance of Bladder Cancer Using Urine Tumor DNA CANCER DISCOVERY 2019; 9 (4): 500–509
Towards Non-Invasive Classification of DLBCL Genetic Subtypes By Ctdna Profiling
American Society of Hematology
View details for DOI 10.1182/blood-2019-132069
Functional significance of U2AF1 S34F mutations in lung adenocarcinomas
View details for DOI 10.1038/s41467-019-13392-y
- An experimental design framework for Markovian gene regulatory networks under stationary control policy BMC. 2018
An experimental design framework for Markovian gene regulatory networks under stationary control policy.
BMC systems biology
2018; 12 (Suppl 8): 137
BACKGROUND: A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty.RESULTS: In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an "intrinsically Bayesian robust" (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) - but only in experimental design, the final policy being an IBR policy.CONCLUSIONS: Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.
View details for PubMedID 30577732
- Circulating Tumor DNA Measurements As Early Outcome Predictors in Diffuse Large B-Cell Lymphoma JOURNAL OF CLINICAL ONCOLOGY 2018; 36 (28): 2845-+
Circulating Tumor DNA Measurements As Early Outcome Predictors in Diffuse Large B-Cell Lymphoma.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology
Purpose Outcomes for patients with diffuse large B-cell lymphoma remain heterogeneous, with existing methods failing to consistently predict treatment failure. We examined the additional prognostic value of circulating tumor DNA (ctDNA) before and during therapy for predicting patient outcomes. Patients and Methods We studied the dynamics of ctDNA from 217 patients treated at six centers, using a training and validation framework. We densely characterized early ctDNA dynamics during therapy using cancer personalized profiling by deep sequencing to define response-associated thresholds within a discovery set. These thresholds were assessed in two independent validation sets. Finally, we assessed the prognostic value of ctDNA in the context of established risk factors, including the International Prognostic Index and interim positron emission tomography/computed tomography scans. Results Before therapy, ctDNA was detectable in 98% of patients; pretreatment levels were prognostic in both front-line and salvage settings. In the discovery set, ctDNA levels changed rapidly, with a 2-log decrease after one cycle (early molecular response [EMR]) and a 2.5-log decrease after two cycles (major molecular response [MMR]) stratifying outcomes. In the first validation set, patients receiving front-line therapy achieving EMR or MMR had superior outcomes at 24 months (EMR: EFS, 83% v 50%; P = .0015; MMR: EFS, 82% v 46%; P < .001). EMR also predicted superior 24-month outcomes in patients receiving salvage therapy in the first validation set (EFS, 100% v 13%; P = .011). The prognostic value of EMR and MMR was further confirmed in the second validation set. In multivariable analyses including International Prognostic Index and interim positron emission tomography/computed tomography scans across both cohorts, molecular response was independently prognostic of outcomes, including event-free and overall survival. Conclusion Pretreatment ctDNA levels and molecular responses are independently prognostic of outcomes in aggressive lymphomas. These risk factors could potentially guide future personalized risk-directed approaches.
View details for PubMedID 30125215
- Optimal Bayesian Kalman Filtering With Prior Update IEEE TRANSACTIONS ON SIGNAL PROCESSING 2018; 66 (8): 1982–96
Detection and surveillance of bladder cancer using urine tumor DNA.
Current regimens for the detection and surveillance of bladder cancer (BLCA) are invasive and have suboptimal sensitivity. Here, we present a novel high-throughput sequencing (HTS) method for detection of urine tumor DNA (utDNA) called utDNA CAPP-Seq (uCAPP-Seq) and apply it to 67 healthy adults and 118 patients with early-stage BLCA who either had urine collected prior to treatment or during surveillance. Using this targeted sequencing approach, we detected a median of 6 mutations per BLCA patient and observed surprisingly frequent mutations of the PLEKHS1 promoter (46%), suggesting these mutations represent a useful biomarker for detection of BLCA. We detected utDNA pre-treatment in 93% of cases using a tumor mutation-informed approach and in 84% when blinded to tumor mutation status, with 96-100% specificity. In the surveillance setting, we detected utDNA in 91% of patients who ultimately recurred, with utDNA detection preceding clinical progression in 92% of cases. uCAPP-Seq outperformed a commonly used ancillary test (UroVysion, p=0.02) and cytology and cystoscopy combined (p is less than or equal to 0.006), detecting 100% of BLCA cases detected by cytology and 82% that cytology missed. Our results indicate that uCAPP-Seq is a promising approach for early detection and surveillance of BLCA.
View details for PubMedID 30578357
Clinical Impact of Somatic Copy Number Alterations in Circulating Tumor DNA from Diverse Lymphoma Subtypes
AMER SOC HEMATOLOGY. 2017
View details for Web of Science ID 000432419406380
- Noninvasive detection of clinically relevant copy number alterations in diffuse large B-cell lymphoma. AMER SOC CLINICAL ONCOLOGY. 2017
- Intrinsically Bayesian Robust Kalman Filter: An Innovation Process Approach IEEE TRANSACTIONS ON SIGNAL PROCESSING 2017; 65 (10): 2531-2546
Early detection of molecular residual disease in localized lung cancer by circulating tumor DNA profiling.
Identifying molecular residual disease (MRD) after treatment of localized lung cancer could facilitate early intervention and personalization of adjuvant therapies. Here we apply Cancer Personalized Profiling by Deep Sequencing (CAPP-Seq) circulating tumor DNA (ctDNA) analysis to 255 samples from 40 patients treated with curative intent for stage I-III lung cancer and 54 healthy adults. In 94% of evaluable patients experiencing recurrence, ctDNA was detectable in the first post-treatment blood sample, indicating reliable identification of MRD. Post-treatment ctDNA detection preceded radiographic progression in 72% of patients by a median of 5.2 months and 53% of patients harbored ctDNA mutation profiles associated with favorable responses to tyrosine kinase inhibitors or immune checkpoint blockade. Collectively, these results indicate that ctDNA MRD in lung cancer patients can be accurately detected using CAPP-Seq and may allow personalized adjuvant treatment while disease burden is lowest.
View details for PubMedID 28899864
Constructing Pathway-based Priors Within a Gaussian Mixture Model for Bayesian Regression and Classification.
IEEE/ACM transactions on computational biology and bioinformatics
Gene-expression-based classification and regression are major concerns in translational genomics. If the feature-label distribution is known, then an optimal classifier can be derived. If the regressor-target distribution is known, then an optimal regression function can be derived. In practice, neither is known, data must be employed, and, for small samples, prior knowledge concerning the feature-label or regressor-target distribution can be used in the learning process. Optimal Bayesian classification and optimal Bayesian regression provide optimality under uncertainty. With optimal Bayesian classification (or regression), uncertainty is treated directly on the feature-label (or regressor-target) distribution. The fundamental engineering problem is prior construction. The Regularized Expected Mean Log-Likelihood Prior (REML) utilizes pathway information and provides viable priors for the feature-label distribution, assuming that the training data contain labels. In practice, the labels may not be observed. This paper extends the REML methodology to a Gaussian mixture model (GMM) when the labels are unknown. Prior construction bundled with prior update via Bayesian sampling results in Monte Carlo approximations to the optimal Bayesian regression function and optimal Bayesian classifier. Simulations demonstrate that the GMM REML prior yields better performance than the EM algorithm for small data sets. We apply it to phenotype classification when the prior knowledge consists of colon cancer pathways.
View details for DOI 10.1109/TCBB.2017.2778715
View details for PubMedID 29990066
Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors.
2017; 18 (Suppl 14): 552
Phenotypic classification is problematic because small samples are ubiquitous; and, for these, use of prior knowledge is critical. If knowledge concerning the feature-label distribution - for instance, genetic pathways - is available, then it can be used in learning. Optimal Bayesian classification provides optimal classification under model uncertainty. It differs from classical Bayesian methods in which a classification model is assumed and prior distributions are placed on model parameters. With optimal Bayesian classification, uncertainty is treated directly on the feature-label distribution, which assures full utilization of prior knowledge and is guaranteed to outperform classical methods.The salient problem confronting optimal Bayesian classification is prior construction. In this paper, we propose a new prior construction methodology based on a general framework of constraints in the form of conditional probability statements. We call this prior the maximal knowledge-driven information prior (MKDIP). The new constraint framework is more flexible than our previous methods as it naturally handles the potential inconsistency in archived regulatory relationships and conditioning can be augmented by other knowledge, such as population statistics. We also extend the application of prior construction to a multinomial mixture model when labels are unknown, which often occurs in practice. The performance of the proposed methods is examined on two important pathway families, the mammalian cell-cycle and a set of p53-related pathways, and also on a publicly available gene expression dataset of non-small cell lung cancer when combined with the existing prior knowledge on relevant signaling pathways.The new proposed general prior construction framework extends the prior construction methodology to a more flexible framework that results in better inference when proper prior knowledge exists. Moreover, the extension of optimal Bayesian classification to multinomial mixtures where data sets are both small and unlabeled, enables superior classifier design using small, unstructured data sets. We have demonstrated the effectiveness of our approach using pathway information and available knowledge of gene regulating functions; however, the underlying theory can be applied to a wide variety of knowledge types, and other applications when there are small samples.
View details for PubMedID 29297278
Development and Validation of Biopsy-Free Genotyping for Molecular Subtyping of Diffuse Large B-Cell Lymphoma
58th Annual Meeting and Exposition of the American-Society-of-Hematology
AMER SOC HEMATOLOGY. 2016
View details for Web of Science ID 000394446803093
- Noninvasive Cancer Classification Using Diverse Genomic Features in Circulating Tumor DNA ASSOC COMPUTING MACHINERY. 2016: 516
An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
2015; 12 (6): 1304-1321
Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
View details for DOI 10.1109/TCBB.2015.2424407
View details for Web of Science ID 000368292400011
View details for PubMedID 26671803
- Discrete optimal Bayesian classification with error-conditioned sequential sampling PATTERN RECOGNITION 2015; 48 (11): 3766-3782
- Incorporation of Biological Pathway Knowledge in the Construction of Priors for Optimal Bayesian Classification IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11 (1): 202-218
Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification
2013; 46 (10): 2783-2797
Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely-used models for the uncertainty classes; ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers that use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/Publications/supplementary/shahrokh12a.
View details for DOI 10.1016/j.patcog.2013.02.017
View details for Web of Science ID 000320477400014
View details for PubMedCentralID PMC4535735
Identification and Analysis of the First 2009 Pandemic H1N1 Influenza Virus from US Feral Swine
ZOONOSES AND PUBLIC HEALTH
2013; 60 (5): 327-335
The first case of pandemic H1N1 influenza (pH1N1) virus in feral swine in the United States was identified in Texas through the United States Department of Agriculture (USDA) Wildlife Services' surveillance program. Two samples were identified as pandemic influenza by reverse transcriptase quantitative PCR (RT-qPCR). Full-genome Sanger sequencing of all eight influenza segments was performed. In addition, Illumina deep sequencing of the original diagnostic samples and their respective virus isolation cultures were performed to assess the feasibility of using an unbiased whole-genome linear target amplification method and multiple sample sequencing in a single Illumina GAIIx lane. Identical sequences were obtained using both techniques. Phylogenetic analysis indicated that all gene segments belonged to the pH1N1 (2009) lineage. In conclusion, we have identified the first pH1N1 isolate in feral swine in the United States and have demonstrated the use of an easy unbiased linear amplification method for deep sequencing of multiple samples.
View details for DOI 10.1111/zph.12006
View details for Web of Science ID 000321666200002
View details for PubMedID 22978260
Probabilistic reconstruction of the tumor progression process in gene regulatory networks in the presence of uncertainty
8th Annual Conference of the MidSouth-Computational-Biology-and-Bioinformatics-Society (MCBIOS)
BIOMED CENTRAL LTD. 2011
Accumulation of gene mutations in cells is known to be responsible for tumor progression, driving it from benign states to malignant states. However, previous studies have shown that the detailed sequence of gene mutations, or the steps in tumor progression, may vary from tumor to tumor, making it difficult to infer the exact path that a given type of tumor may have taken.In this paper, we propose an effective probabilistic algorithm for reconstructing the tumor progression process based on partial knowledge of the underlying gene regulatory network and the steady state distribution of the gene expression values in a given tumor. We take the BNp (Boolean networks with pertubation) framework to model the gene regulatory networks. We assume that the true network is not exactly known but we are given an uncertainty class of networks that contains the true network. This network uncertainty class arises from our partial knowledge of the true network, typically represented as a set of local pathways that are embedded in the global network. Given the SSD of the cancerous network, we aim to simultaneously identify the true normal (healthy) network and the set of gene mutations that drove the network into the cancerous state. This is achieved by analyzing the effect of gene mutation on the SSD of a gene regulatory network. At each step, the proposed algorithm reduces the uncertainty class by keeping only those networks whose SSDs get close enough to the cancerous SSD as a result of additional gene mutation. These steps are repeated until we can find the best candidate for the true network and the most probable path of tumor progression.Simulation results based on both synthetic networks and networks constructed from actual pathway knowledge show that the proposed algorithm can identify the normal network and the actual path of tumor progression with high probability. The algorithm is also robust to model mismatch and allows us to control the trade-off between efficiency and accuracy.
View details for DOI 10.1186/1471-2105-12-S10-S9
View details for Web of Science ID 000303933600009
View details for PubMedID 22166046