Master of Science, Sharif University of Technology (2009)
Doctor of Philosophy, Texas A&M University College Station (2014)
Bachelor of Science, University Of Tehran (2007)
Distinct biological subtypes and patterns of genome evolution in lymphoma revealed by circulating tumor DNA
SCIENCE TRANSLATIONAL MEDICINE
2016; 8 (364)
Patients with diffuse large B cell lymphoma (DLBCL) exhibit marked diversity in tumor behavior and outcomes, yet the identification of poor-risk groups remains challenging. In addition, the biology underlying these differences is incompletely understood. We hypothesized that characterization of mutational heterogeneity and genomic evolution using circulating tumor DNA (ctDNA) profiling could reveal molecular determinants of adverse outcomes. To address this hypothesis, we applied cancer personalized profiling by deep sequencing (CAPP-Seq) analysis to tumor biopsies and cell-free DNA samples from 92 lymphoma patients and 24 healthy subjects. At diagnosis, the amount of ctDNA was found to strongly correlate with clinical indices and was independently predictive of patient outcomes. We demonstrate that ctDNA genotyping can classify transcriptionally defined tumor subtypes, including DLBCL cell of origin, directly from plasma. By simultaneously tracking multiple somatic mutations in ctDNA, our approach outperformed immunoglobulin sequencing and radiographic imaging for the detection of minimal residual disease and facilitated noninvasive identification of emergent resistance mutations to targeted therapies. In addition, we identified distinct patterns of clonal evolution distinguishing indolent follicular lymphomas from those that transformed into DLBCL, allowing for potential noninvasive prediction of histological transformation. Collectively, our results demonstrate that ctDNA analysis reveals biological factors that underlie lymphoma clinical outcomes and could facilitate individualized therapy.
View details for DOI 10.1126/scitranslmed.aai8545
View details for Web of Science ID 000389448100006
View details for PubMedID 27831904
Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients
Circulating tumour DNA (ctDNA) analysis facilitates studies of tumour heterogeneity. Here we employ CAPP-Seq ctDNA analysis to study resistance mechanisms in 43 non-small cell lung cancer (NSCLC) patients treated with the third-generation epidermal growth factor receptor (EGFR) inhibitor rociletinib. We observe multiple resistance mechanisms in 46% of patients after treatment with first-line inhibitors, indicating frequent intra-patient heterogeneity. Rociletinib resistance recurrently involves MET, EGFR, PIK3CA, ERRB2, KRAS and RB1. We describe a novel EGFR L798I mutation and find that EGFR C797S, which arises in ∼33% of patients after osimertinib treatment, occurs in <3% after rociletinib. Increased MET copy number is the most frequent rociletinib resistance mechanism in this cohort and patients with multiple pre-existing mechanisms (T790M and MET) experience inferior responses. Similarly, rociletinib-resistant xenografts develop MET amplification that can be overcome with the MET inhibitor crizotinib. These results underscore the importance of tumour heterogeneity in NSCLC and the utility of ctDNA-based resistance mechanism assessment.
View details for DOI 10.1038/ncomms11815
View details for Web of Science ID 000378007200001
View details for PubMedID 27283993
View details for PubMedCentralID PMC4906406
Effect of separate sampling on classification accuracy.
2014; 30 (2): 242-250
Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples.We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier.All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b.
View details for DOI 10.1093/bioinformatics/btt662
View details for PubMedID 24257187
- Intrinsically Bayesian Robust Kalman Filter: An Innovation Process Approach IEEE TRANSACTIONS ON SIGNAL PROCESSING 2017; 65 (10): 2531-2546
Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors.
2017; 18 (Suppl 14): 552
Phenotypic classification is problematic because small samples are ubiquitous; and, for these, use of prior knowledge is critical. If knowledge concerning the feature-label distribution - for instance, genetic pathways - is available, then it can be used in learning. Optimal Bayesian classification provides optimal classification under model uncertainty. It differs from classical Bayesian methods in which a classification model is assumed and prior distributions are placed on model parameters. With optimal Bayesian classification, uncertainty is treated directly on the feature-label distribution, which assures full utilization of prior knowledge and is guaranteed to outperform classical methods.The salient problem confronting optimal Bayesian classification is prior construction. In this paper, we propose a new prior construction methodology based on a general framework of constraints in the form of conditional probability statements. We call this prior the maximal knowledge-driven information prior (MKDIP). The new constraint framework is more flexible than our previous methods as it naturally handles the potential inconsistency in archived regulatory relationships and conditioning can be augmented by other knowledge, such as population statistics. We also extend the application of prior construction to a multinomial mixture model when labels are unknown, which often occurs in practice. The performance of the proposed methods is examined on two important pathway families, the mammalian cell-cycle and a set of p53-related pathways, and also on a publicly available gene expression dataset of non-small cell lung cancer when combined with the existing prior knowledge on relevant signaling pathways.The new proposed general prior construction framework extends the prior construction methodology to a more flexible framework that results in better inference when proper prior knowledge exists. Moreover, the extension of optimal Bayesian classification to multinomial mixtures where data sets are both small and unlabeled, enables superior classifier design using small, unstructured data sets. We have demonstrated the effectiveness of our approach using pathway information and available knowledge of gene regulating functions; however, the underlying theory can be applied to a wide variety of knowledge types, and other applications when there are small samples.
View details for DOI 10.1186/s12859-017-1893-4
View details for PubMedID 29297278
View details for PubMedCentralID PMC5751802
Early detection of molecular residual disease in localized lung cancer by circulating tumor DNA profiling.
Identifying molecular residual disease (MRD) after treatment of localized lung cancer could facilitate early intervention and personalization of adjuvant therapies. Here we apply Cancer Personalized Profiling by Deep Sequencing (CAPP-Seq) circulating tumor DNA (ctDNA) analysis to 255 samples from 40 patients treated with curative intent for stage I-III lung cancer and 54 healthy adults. In 94% of evaluable patients experiencing recurrence, ctDNA was detectable in the first post-treatment blood sample, indicating reliable identification of MRD. Post-treatment ctDNA detection preceded radiographic progression in 72% of patients by a median of 5.2 months and 53% of patients harbored ctDNA mutation profiles associated with favorable responses to tyrosine kinase inhibitors or immune checkpoint blockade. Collectively, these results indicate that ctDNA MRD in lung cancer patients can be accurately detected using CAPP-Seq and may allow personalized adjuvant treatment while disease burden is lowest.
View details for DOI 10.1158/2159-8290.CD-17-0716
View details for PubMedID 28899864
Development and Validation of Biopsy-Free Genotyping for Molecular Subtyping of Diffuse Large B-Cell Lymphoma
58th Annual Meeting and Exposition of the American-Society-of-Hematology
AMER SOC HEMATOLOGY. 2016
View details for Web of Science ID 000394446803093
An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
2015; 12 (6): 1304-1321
Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
View details for DOI 10.1109/TCBB.2015.2424407
View details for Web of Science ID 000368292400011
View details for PubMedID 26671803
- Discrete optimal Bayesian classification with error-conditioned sequential sampling PATTERN RECOGNITION 2015; 48 (11): 3766-3782
- Incorporation of Biological Pathway Knowledge in the Construction of Priors for Optimal Bayesian Classification IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11 (1): 202-218
Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach
2013; 23 (11): 1928-1937
The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites.
View details for DOI 10.1101/gr.157420.113
View details for Web of Science ID 000326642500016
View details for PubMedID 23950146
- Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification PATTERN RECOGNITION 2013; 46 (10): 2783-2797
Identification and Analysis of the First 2009 Pandemic H1N1 Influenza Virus from US Feral Swine
ZOONOSES AND PUBLIC HEALTH
2013; 60 (5): 327-335
The first case of pandemic H1N1 influenza (pH1N1) virus in feral swine in the United States was identified in Texas through the United States Department of Agriculture (USDA) Wildlife Services' surveillance program. Two samples were identified as pandemic influenza by reverse transcriptase quantitative PCR (RT-qPCR). Full-genome Sanger sequencing of all eight influenza segments was performed. In addition, Illumina deep sequencing of the original diagnostic samples and their respective virus isolation cultures were performed to assess the feasibility of using an unbiased whole-genome linear target amplification method and multiple sample sequencing in a single Illumina GAIIx lane. Identical sequences were obtained using both techniques. Phylogenetic analysis indicated that all gene segments belonged to the pH1N1 (2009) lineage. In conclusion, we have identified the first pH1N1 isolate in feral swine in the United States and have demonstrated the use of an easy unbiased linear amplification method for deep sequencing of multiple samples.
View details for DOI 10.1111/zph.12006
View details for Web of Science ID 000321666200002
View details for PubMedID 22978260
Probabilistic reconstruction of the tumor progression process in gene regulatory networks in the presence of uncertainty
8th Annual Conference of the MidSouth-Computational-Biology-and-Bioinformatics-Society (MCBIOS)
BIOMED CENTRAL LTD. 2011
Accumulation of gene mutations in cells is known to be responsible for tumor progression, driving it from benign states to malignant states. However, previous studies have shown that the detailed sequence of gene mutations, or the steps in tumor progression, may vary from tumor to tumor, making it difficult to infer the exact path that a given type of tumor may have taken.In this paper, we propose an effective probabilistic algorithm for reconstructing the tumor progression process based on partial knowledge of the underlying gene regulatory network and the steady state distribution of the gene expression values in a given tumor. We take the BNp (Boolean networks with pertubation) framework to model the gene regulatory networks. We assume that the true network is not exactly known but we are given an uncertainty class of networks that contains the true network. This network uncertainty class arises from our partial knowledge of the true network, typically represented as a set of local pathways that are embedded in the global network. Given the SSD of the cancerous network, we aim to simultaneously identify the true normal (healthy) network and the set of gene mutations that drove the network into the cancerous state. This is achieved by analyzing the effect of gene mutation on the SSD of a gene regulatory network. At each step, the proposed algorithm reduces the uncertainty class by keeping only those networks whose SSDs get close enough to the cancerous SSD as a result of additional gene mutation. These steps are repeated until we can find the best candidate for the true network and the most probable path of tumor progression.Simulation results based on both synthetic networks and networks constructed from actual pathway knowledge show that the proposed algorithm can identify the normal network and the actual path of tumor progression with high probability. The algorithm is also robust to model mismatch and allows us to control the trade-off between efficiency and accuracy.
View details for DOI 10.1186/1471-2105-12-S10-S9
View details for Web of Science ID 000303933600009
View details for PubMedID 22166046