My expertise is in developing machine learning tools for high dimensional data. In particular, I develop Bayesian models, where 'prior knowledge', from external sources or inherent to the data set in hand, can be converted into mathematical terms (i.e. prior probabilities). I have recently focused on analyzing genetic and epigenetic signals in cell-free DNA assays. Traditional (computational) methods in cancer genomics are limited when signal to noise ratio is ultra-low which is often the case in cfDNA analyses. Therefore, there is a growing need to develop novel and more powerful methods to overcome this limitation.
Instructor, Medicine - Oncology
Honors & Awards
T32 Training Program Fellowship, Stanford Medicine (Department of Radiation Oncology) (09/2018-)
Cancer Systems Biology Program Fellowship (NIH-R25), Stanford University (09/2015-08/2017)
NCI Speaker/Travel Award, NCI (Systems Analysis of Cancer Biology conference) (04/2016)
BSc, University of Tehran, Iran, Electrical Engineering (2007)
MSc, Sharif University of Technology, Iran, Electrical Engineering (2009)
PhD, Texas A&M University, Electrical Engineering, Machine Learning (2014)
Inferring gene expression from cell-free DNA fragmentation profiles.
Profiling of circulating tumor DNA (ctDNA) in the bloodstream shows promise for noninvasive cancer detection. Chromatin fragmentation features have previously been explored to infer gene expression profiles from cell-free DNA (cfDNA), but current fragmentomic methods require high concentrations of tumor-derived DNA and provide limited resolution. Here we describe promoter fragmentation entropy as an epigenomic cfDNA feature that predicts RNA expression levels at individual genes. We developed 'epigenetic expression inference from cell-free DNA-sequencing' (EPIC-seq), a method that uses targeted sequencing of promoters of genes of interest. Profiling 329 blood samples from 201 patients with cancer and 87 healthy adults, we demonstrate classification of subtypes of lung carcinoma and diffuse large B cell lymphoma. Applying EPIC-seq to serial blood samples from patients treated with PD-(L)1 immune-checkpoint inhibitors, we show that gene expression profiles inferred by EPIC-seq are correlated with clinical response. Our results indicate that EPIC-seq could enable noninvasive, high-throughput tissue-of-origin characterization with diagnostic, prognostic and therapeutic potential.
View details for DOI 10.1038/s41587-022-01222-4
View details for PubMedID 35361996
- Integrating genomic features for non-invasive early lung cancer detection NATURE 2020
Noninvasive Early Identification of Therapeutic Benefit from Immune Checkpoint Inhibition.
Although treatment of non-small cell lung cancer (NSCLC) with immune checkpoint inhibitors (ICIs) can produce remarkably durable responses, most patients develop early disease progression. Furthermore, initial response assessment by conventional imaging is often unable to identify which patients will achieve durable clinical benefit (DCB). Here, we demonstrate that pre-treatment circulating tumor DNA (ctDNA) and peripheral CD8 T cell levels are independently associated with DCB. We further show that ctDNA dynamics after a single infusion can aid in identification of patients who will achieve DCB. Integrating these determinants, we developed and validated an entirely noninvasive multiparameter assay (DIREct-On, Durable Immunotherapy Response Estimation by immune profiling and ctDNA-On-treatment) that robustly predicts which patients will achieve DCB with higher accuracy than any individual feature. Taken together, these results demonstrate that integrated ctDNA and circulating immune cell profiling can provide accurate, noninvasive, and early forecasting of ultimate outcomes for NSCLC patients receiving ICIs.
View details for DOI 10.1016/j.cell.2020.09.001
View details for PubMedID 33007267
Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction.
Accurate prediction of long-term outcomes remains a challenge in the care of cancer patients. Due to the difficulty of serial tumor sampling, previous prediction tools have focused on pretreatment factors. However, emerging non-invasive diagnostics have increased opportunities for serial tumor assessments. We describe the Continuous Individualized Risk Index (CIRI), a method to dynamically determine outcome probabilities for individual patients utilizing risk predictors acquired over time. Similar to "win probability" models in other fields, CIRI provides a real-time probability by integrating risk assessments throughout a patient's course. Applying CIRI to patients with diffuse large B cell lymphoma, we demonstrate improved outcome prediction compared to conventional risk models. We demonstrate CIRI's broader utility in analogous models of chronic lymphocytic leukemia and breast adenocarcinoma and perform a proof-of-concept analysis demonstrating how CIRI could be used to develop predictive biomarkers for therapy selection. We envision thatdynamic risk assessment will facilitate personalized medicine and enable innovative therapeutic paradigms.
View details for DOI 10.1016/j.cell.2019.06.011
View details for PubMedID 31280963
Functional significance of U2AF1 S34F mutations in lung adenocarcinomas.
2019; 10 (1): 5712
The functional role of U2AF1 mutations in lung adenocarcinomas (LUADs) remains incompletely understood. Here, we report a significant co-occurrence of U2AF1 S34F mutations with ROS1 translocations in LUADs. To characterize this interaction, we profiled effects of S34F on the transcriptome-wide distribution of RNA binding and alternative splicing in cells harboring the ROS1 translocation. Compared to its wild-type counterpart, U2AF1 S34F preferentially binds and modulates splicing of introns containing CAG trinucleotides at their 3' splice junctions. The presence of S34F caused a shift in cross-linking at 3' splice sites, which was significantly associated with alternative splicing of skipped exons. U2AF1 S34F induced expression of genes involved in the epithelial-mesenchymal transition (EMT) and increased tumor cell invasion. Finally, S34F increased splicing of the long over the short SLC34A2-ROS1 isoform, which was also associated with enhanced invasiveness. Taken together, our results suggest a mechanistic interaction between mutant U2AF1 and ROS1 in LUAD.
View details for DOI 10.1038/s41467-019-13392-y
View details for PubMedID 31836708
Effect of separate sampling on classification accuracy.
2014; 30 (2): 242-250
Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples.We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier.All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b.
View details for DOI 10.1093/bioinformatics/btt662
View details for PubMedID 24257187
Genomic Profiling of Bronchoalveolar Lavage Fluid in Lung Cancer.
Genomic profiling of Bronchoalveolar Lavage (BAL) samples may be useful for tumor profiling and diagnosis in the clinic. Here, we compared tumor-derived mutations detected in BAL samples from subjects with non-small cell lung cancer (NSCLC) to those detected in matched plasma samples. CAncer Personalized Profiling by deep Sequencing (CAPP-Seq) was used to genotype DNA purified from BAL, plasma and tumor samples from patients with NSCLC. The characteristics of cell-free DNA (cfDNA) isolated from BAL fluid were first characterized to optimize the technical approach. Somatic mutations identified in tumor were then compared to those identified in BAL and plasma, and the potential of BAL cfDNA analysis to distinguish lung cancer patients from risk-matched controls was explored. In total, 200 biofluid and tumor samples from 38 cases and 21 controls undergoing BAL for lung cancer evaluation were profiled. More tumor variants were identified in BAL cfDNA than plasma cfDNA in all stages (p<0.001) and in stage I-II disease only. Four of 21 controls harbored low levels of cancer-associated driver mutations in BAL cfDNA (mean VAF=0.5%), suggesting the presence of somatic mutations in non-malignant airway cells. Finally, using a Random Forest model with leave-one-out cross validation, an exploratory BAL genomic classifier identified lung cancer with 69% sensitivity and 100% specificity in this cohort and detected more cancers than BAL cytology. Detecting tumor-derived mutations by targeted sequencing of BAL cfDNA is technically feasible and appears to be more sensitive than plasma profiling. Further studies are required to define optimal diagnostic applications and clinical utility.
View details for DOI 10.1158/0008-5472.CAN-22-0554
View details for PubMedID 35748739
Enhanced detection of minimal residual disease by targeted sequencing of phased variants in circulating tumor DNA.
Circulating tumor-derived DNA (ctDNA) is an emerging biomarker for many cancers, but the limited sensitivity of current detection methods reduces its utility for diagnosing minimal residual disease. Here we describe phased variant enrichment and detection sequencing (PhasED-seq), a method that uses multiple somatic mutations in individual DNA fragments to improve the sensitivity of ctDNA detection. Leveraging whole-genome sequences from 2,538 tumors, we identify phased variants and their associations with mutational signatures. We show that even without molecular barcodes, the limits of detection of PhasED-seq outperform prior methods, including duplex barcoding, allowing ctDNA detection in the ppm range in participant samples. We profiled 678 specimens from 213 participants with B cell lymphomas, including serial cell-free DNA samples before and during therapy for diffuse large B cell lymphoma. In participants with undetectable ctDNA after two cycles of therapy using a next-generation sequencing-based approach termed cancer personalized profiling by deep sequencing, an additional 25% have ctDNA detectable by PhasED-seq and have worse outcomes. Finally, we demonstrate the application of PhasED-seq to solid tumors.
View details for DOI 10.1038/s41587-021-00981-w
View details for PubMedID 34294911
Short Diagnosis-to-Treatment Interval Is Associated With Higher Circulating Tumor DNA Levels in Diffuse Large B-Cell Lymphoma.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology
Patients with Diffuse Large B-cell Lymphoma (DLBCL) in need of immediate therapy are largely under-represented in clinical trials. The diagnosis-to-treatment interval (DTI) has recently been described as a metric to quantify such patient selection bias, with short DTI being associated with adverse risk factors and inferior outcomes. Here, we characterized the relationships between DTI, circulating tumor DNA (ctDNA), conventional risk factors, and clinical outcomes, with the goal of defining objective disease metrics contributing to selection bias.We evaluated pretreatment ctDNA levels in 267 patients with DLBCL treated across multiple centers in Europe and the United States using Cancer Personalized Profiling by Deep Sequencing. Pretreatment ctDNA levels were correlated with DTI, total metabolic tumor volumes (TMTVs), the International Prognostic Index (IPI), and outcome.Short DTI was associated with advanced-stage disease (P < .001) and higher IPI (P < .001). We also found an inverse correlation between DTI and TMTV (RS= -0.37; P < .001). Similarly, pretreatment ctDNA levels were significantly associated with stage, IPI, and TMTV (all P < .001), demonstrating that both DTI and ctDNA reflect disease burden. Notably, patients with shorter DTI had higher pretreatment ctDNA levels (P < .001). Pretreatment ctDNA levels predicted short DTI independent of the IPI (P < .001). Although each risk factor was significantly associated with event-free survival in univariable analysis, ctDNA level was prognostic of event-free survival independent of DTI and IPI in multivariable Cox regression (ctDNA: hazard ratio, 1.5; 95% CI [1.2 to 2.0]; IPI: 1.1 [0.9 to 1.3]; -DTI: 1.1 [1.0 to 1.2]).Short DTI largely reflects baseline tumor burden, which can be objectively measured using pretreatment ctDNA levels. Pretreatment ctDNA levels therefore have utility for quantifying and guarding against selection biases in prospective DLBCL clinical trials.
View details for DOI 10.1200/JCO.20.02573
View details for PubMedID 33909455
The landscape of tumor cell states and ecosystems in diffuse large B cell lymphoma.
Biological heterogeneity in diffuse large B cell lymphoma (DLBCL) is partly driven by cell-of-origin subtypes and associated genomic lesions, but also by diverse cell types and cell states in the tumor microenvironment (TME). However, dissecting these cell states and their clinical relevance at scale remains challenging. Here, we implemented EcoTyper, a machine-learning framework integrating transcriptome deconvolution and single-cell RNA sequencing, to characterize clinically relevant DLBCL cell states and ecosystems. Using this approach, we identified five cell states of malignant B cells that vary in prognostic associations and differentiation status. We also identified striking variation in cell states for 12 other lineages comprising the TME and forming cell state interactions in stereotyped ecosystems. While cell-of-origin subtypes have distinct TME composition, DLBCL ecosystems capture clinical heterogeneity within existing subtypes and extend beyond cell-of-origin and genotypic classes. These results resolve the DLBCL microenvironment at systems-level resolution and identify opportunities for therapeutic targeting (https://ecotyper.stanford.edu/lymphoma).
View details for DOI 10.1016/j.ccell.2021.08.011
View details for PubMedID 34597589
- Chromatin accessibility patterns in cell-free DNA reveal tumor heterogeneity AMER ASSOC CANCER RESEARCH. 2020
A mid-chemoradiation dynamic risk model integrating tumor features and ctDNA analysis for lung cancer outcome prediction.
AMER SOC CLINICAL ONCOLOGY. 2020
View details for Web of Science ID 000560368303378
Evaluating upfront high-dose consolidation after R-CHOP for follicular lymphoma by clinical and genetic risk models.
2020; 4 (18): 4451–62
High-dose therapy and autologous stem cell transplantation (HDT/ASCT) is an effective salvage treatment for eligible patients with follicular lymphoma (FL) and early progression of disease (POD). Since the introduction of rituximab, HDT/ASCT is no longer recommended in first remission. We here explored whether consolidative HDT/ASCT improved survival in defined subgroups of previously untreated patients. We report survival analyses of 431 patients who received frontline rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) for advanced FL, and were randomized to receive consolidative HDT/ASCT. We performed targeted genotyping of 157 diagnostic biopsies, and calculated genotype-based risk scores. HDT/ASCT improved failure-free survival (FFS; hazard ratio [HR], 0.8, P = .07; as-treated: HR, 0.7, P = .04), but not overall survival (OS; HR, 1.3, P = .27; as-treated: HR, 1.4, P = .13). High-risk cohorts identified by FL International Prognostic Index (FLIPI), and the clinicogenetic risk models m7-FLIPI and POD within 24 months-prognostic index (POD24-PI) comprised 27%, 18%, and 22% of patients. HDT/ASCT did not significantly prolong FFS in high-risk patients as defined by FLIPI (HR, 0.9; P = .56), m7-FLIPI (HR, 0.9; P = .91), and POD24-PI (HR, 0.8; P = .60). Similarly, OS was not significantly improved. Finally, we used a machine-learning approach to predict benefit from HDT/ASCT by genotypes. Patients predicted to benefit from HDT/ASCT had longer FFS with HDT/ASCT (HR, 0.4; P = .03), but OS did not reach statistical significance. Thus, consolidative HDT/ASCT after frontline R-CHOP did not improve OS in unselected FL patients and subgroups selected by genotype-based risk models.
View details for DOI 10.1182/bloodadvances.2020002546
View details for PubMedID 32941649
- An Atlas of Clinically-Distinct Tumor Cellular Ecosystems in Diffuse Large B Cell Lymphoma AMER SOC HEMATOLOGY. 2019
Broad Genomic Profiling of Bronchoalveolar Lavage Fluid in Lung Cancer
ELSEVIER SCIENCE INC. 2019: S747–S748
View details for Web of Science ID 000492162204084
- Validated Limited Gene Predictor For Cervical Cancer Lymph Node Metastases ELSEVIER SCIENCE INC. 2019: S50
- Determining cell type abundance and expression from bulk tissues with digital cytometry NATURE BIOTECHNOLOGY 2019; 37 (7): 773-+
- Detection and Surveillance of Bladder Cancer Using Urine Tumor DNA CANCER DISCOVERY 2019; 9 (4): 500–509
Circulating tumor DNA analysis for detection of minimal residual disease after chemoradiotherapy for localized esophageal cancer.
Biomarkers are needed to identify patients at risk of tumor progression following chemoradiotherapy for localized esophageal cancer. These could improve identification of patients at risk for cancer progression and selection of therapy.We performed deep sequencing (CAPP-Seq) analyses of plasma cell-free DNA collected from 45 patients before and after chemoradiotherapy for esophageal cancer, as well as DNA from leukocytes, and fixed esophageal tumor biopsies collected during esophagogastroduodenoscopy. Patients were treated from May 2010 through October 2015; 23 patients subsequently underwent esophagectomy and 22 did not undergo surgery. We also sequenced DNA from blood samples from 40 healthy individuals (controls). We analyzed 802 regions of 607 genes for single-nucleotide variants previously associated with esophageal adenocarcinoma or squamous cell carcinoma. Patients underwent imaging analyses 6-8 weeks after chemoradiotherapy and were followed for 5 years. Our primary aim was to determine whether detection of circulating tumor DNA (ctDNA) following chemoradiotherapy is associated with risk of tumor progression (growth of local, regional, or distant tumors, detected by imaging or biopsy).The median proportion of tumor-derived DNA in total cell-free DNA before treatment was 0.07%, indicating that ultrasensitive assays are needed for quantification and analysis of ctDNA from localized esophageal tumors. Detection of ctDNA following chemoradiotherapy was associated with tumor progression (hazard ratio, 18.7; P<.0001), formation of distant metastases (hazard ratio, 32.1; P<.0001), and shorter disease-specific survival times (hazard ratio, 23.1; P<.0001). A higher proportion of patients with tumor progression had new mutations detected in plasma samples collected after chemoradiotherapy than patients without progression (P=.03). Detection of ctDNA after chemoradiotherapy preceded radiographic evidence of tumor progression by an average of 2.8 months. Among patients who received chemoradiotherapy without surgery, combined ctDNA and metabolic imaging analysis predicted progression in 100% of patients with tumor progression, compared with 71% for only ctDNA detection and 57% for only metabolic imaging analysis (P<.001 for comparison of either technique to combined analysis).In an analysis of cell-free DNA in blood samples from patients who underwent chemoradiotherapy for esophageal cancer, detection of ctDNA was associated with tumor progression, metastasis, and disease-specific survival. Analysis of ctDNA might be used to identify patients at highest risk for tumor progression.
View details for DOI 10.1053/j.gastro.2019.10.039
View details for PubMedID 31711920
Circulating DNA for Molecular Response Prediction, Characterization of Resistance Mechanisms and Quantification of CAR T-Cells during Axicabtagene Ciloleucel Therapy
American Society of Hematology
View details for DOI 10.1182/blood-2019-129015
Towards Non-Invasive Classification of DLBCL Genetic Subtypes By Ctdna Profiling
American Society of Hematology
View details for DOI 10.1182/blood-2019-132069
An experimental design framework for Markovian gene regulatory networks under stationary control policy.
BMC systems biology
2018; 12 (Suppl 8): 137
BACKGROUND: A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty.RESULTS: In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an "intrinsically Bayesian robust" (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) - but only in experimental design, the final policy being an IBR policy.CONCLUSIONS: Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.
View details for PubMedID 30577732
- An experimental design framework for Markovian gene regulatory networks under stationary control policy BMC. 2018
- Noninvasive Genotyping and Monitoring of Classical Hodgkin Lymphoma AMER SOC HEMATOLOGY. 2018
- Distinct Chromatin Accessibility Profiles of Lymphoma Subtypes Revealed By Targeted Cell Free DNA Profiling AMER SOC HEMATOLOGY. 2018
- Circulating Tumor DNA Measurements As Early Outcome Predictors in Diffuse Large B-Cell Lymphoma JOURNAL OF CLINICAL ONCOLOGY 2018; 36 (28): 2845-+
Circulating Tumor DNA Measurements As Early Outcome Predictors in Diffuse Large B-Cell Lymphoma.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology
Purpose Outcomes for patients with diffuse large B-cell lymphoma remain heterogeneous, with existing methods failing to consistently predict treatment failure. We examined the additional prognostic value of circulating tumor DNA (ctDNA) before and during therapy for predicting patient outcomes. Patients and Methods We studied the dynamics of ctDNA from 217 patients treated at six centers, using a training and validation framework. We densely characterized early ctDNA dynamics during therapy using cancer personalized profiling by deep sequencing to define response-associated thresholds within a discovery set. These thresholds were assessed in two independent validation sets. Finally, we assessed the prognostic value of ctDNA in the context of established risk factors, including the International Prognostic Index and interim positron emission tomography/computed tomography scans. Results Before therapy, ctDNA was detectable in 98% of patients; pretreatment levels were prognostic in both front-line and salvage settings. In the discovery set, ctDNA levels changed rapidly, with a 2-log decrease after one cycle (early molecular response [EMR]) and a 2.5-log decrease after two cycles (major molecular response [MMR]) stratifying outcomes. In the first validation set, patients receiving front-line therapy achieving EMR or MMR had superior outcomes at 24 months (EMR: EFS, 83% v 50%; P = .0015; MMR: EFS, 82% v 46%; P < .001). EMR also predicted superior 24-month outcomes in patients receiving salvage therapy in the first validation set (EFS, 100% v 13%; P = .011). The prognostic value of EMR and MMR was further confirmed in the second validation set. In multivariable analyses including International Prognostic Index and interim positron emission tomography/computed tomography scans across both cohorts, molecular response was independently prognostic of outcomes, including event-free and overall survival. Conclusion Pretreatment ctDNA levels and molecular responses are independently prognostic of outcomes in aggressive lymphomas. These risk factors could potentially guide future personalized risk-directed approaches.
View details for PubMedID 30125215
- Optimal Bayesian Kalman Filtering With Prior Update IEEE TRANSACTIONS ON SIGNAL PROCESSING 2018; 66 (8): 1982–96
Detection and surveillance of bladder cancer using urine tumor DNA.
Current regimens for the detection and surveillance of bladder cancer (BLCA) are invasive and have suboptimal sensitivity. Here, we present a novel high-throughput sequencing (HTS) method for detection of urine tumor DNA (utDNA) called utDNA CAPP-Seq (uCAPP-Seq) and apply it to 67 healthy adults and 118 patients with early-stage BLCA who either had urine collected prior to treatment or during surveillance. Using this targeted sequencing approach, we detected a median of 6 mutations per BLCA patient and observed surprisingly frequent mutations of the PLEKHS1 promoter (46%), suggesting these mutations represent a useful biomarker for detection of BLCA. We detected utDNA pre-treatment in 93% of cases using a tumor mutation-informed approach and in 84% when blinded to tumor mutation status, with 96-100% specificity. In the surveillance setting, we detected utDNA in 91% of patients who ultimately recurred, with utDNA detection preceding clinical progression in 92% of cases. uCAPP-Seq outperformed a commonly used ancillary test (UroVysion, p=0.02) and cytology and cystoscopy combined (p is less than or equal to 0.006), detecting 100% of BLCA cases detected by cytology and 82% that cytology missed. Our results indicate that uCAPP-Seq is a promising approach for early detection and surveillance of BLCA.
View details for PubMedID 30578357
Clinical Impact of Somatic Copy Number Alterations in Circulating Tumor DNA from Diverse Lymphoma Subtypes
AMER SOC HEMATOLOGY. 2017
View details for Web of Science ID 000432419406380
Constructing Pathway-based Priors Within a Gaussian Mixture Model for Bayesian Regression and Classification.
IEEE/ACM transactions on computational biology and bioinformatics
Gene-expression-based classification and regression are major concerns in translational genomics. If the feature-label distribution is known, then an optimal classifier can be derived. If the regressor-target distribution is known, then an optimal regression function can be derived. In practice, neither is known, data must be employed, and, for small samples, prior knowledge concerning the feature-label or regressor-target distribution can be used in the learning process. Optimal Bayesian classification and optimal Bayesian regression provide optimality under uncertainty. With optimal Bayesian classification (or regression), uncertainty is treated directly on the feature-label (or regressor-target) distribution. The fundamental engineering problem is prior construction. The Regularized Expected Mean Log-Likelihood Prior (REML) utilizes pathway information and provides viable priors for the feature-label distribution, assuming that the training data contain labels. In practice, the labels may not be observed. This paper extends the REML methodology to a Gaussian mixture model (GMM) when the labels are unknown. Prior construction bundled with prior update via Bayesian sampling results in Monte Carlo approximations to the optimal Bayesian regression function and optimal Bayesian classifier. Simulations demonstrate that the GMM REML prior yields better performance than the EM algorithm for small data sets. We apply it to phenotype classification when the prior knowledge consists of colon cancer pathways.
View details for DOI 10.1109/TCBB.2017.2778715
View details for PubMedID 29990066
- Noninvasive detection of clinically relevant copy number alterations in diffuse large B-cell lymphoma. AMER SOC CLINICAL ONCOLOGY. 2017
- Intrinsically Bayesian Robust Kalman Filter: An Innovation Process Approach IEEE TRANSACTIONS ON SIGNAL PROCESSING 2017; 65 (10): 2531-2546
Early detection of molecular residual disease in localized lung cancer by circulating tumor DNA profiling.
Identifying molecular residual disease (MRD) after treatment of localized lung cancer could facilitate early intervention and personalization of adjuvant therapies. Here we apply Cancer Personalized Profiling by Deep Sequencing (CAPP-Seq) circulating tumor DNA (ctDNA) analysis to 255 samples from 40 patients treated with curative intent for stage I-III lung cancer and 54 healthy adults. In 94% of evaluable patients experiencing recurrence, ctDNA was detectable in the first post-treatment blood sample, indicating reliable identification of MRD. Post-treatment ctDNA detection preceded radiographic progression in 72% of patients by a median of 5.2 months and 53% of patients harbored ctDNA mutation profiles associated with favorable responses to tyrosine kinase inhibitors or immune checkpoint blockade. Collectively, these results indicate that ctDNA MRD in lung cancer patients can be accurately detected using CAPP-Seq and may allow personalized adjuvant treatment while disease burden is lowest.
View details for PubMedID 28899864
Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors.
2017; 18 (Suppl 14): 552
Phenotypic classification is problematic because small samples are ubiquitous; and, for these, use of prior knowledge is critical. If knowledge concerning the feature-label distribution - for instance, genetic pathways - is available, then it can be used in learning. Optimal Bayesian classification provides optimal classification under model uncertainty. It differs from classical Bayesian methods in which a classification model is assumed and prior distributions are placed on model parameters. With optimal Bayesian classification, uncertainty is treated directly on the feature-label distribution, which assures full utilization of prior knowledge and is guaranteed to outperform classical methods.The salient problem confronting optimal Bayesian classification is prior construction. In this paper, we propose a new prior construction methodology based on a general framework of constraints in the form of conditional probability statements. We call this prior the maximal knowledge-driven information prior (MKDIP). The new constraint framework is more flexible than our previous methods as it naturally handles the potential inconsistency in archived regulatory relationships and conditioning can be augmented by other knowledge, such as population statistics. We also extend the application of prior construction to a multinomial mixture model when labels are unknown, which often occurs in practice. The performance of the proposed methods is examined on two important pathway families, the mammalian cell-cycle and a set of p53-related pathways, and also on a publicly available gene expression dataset of non-small cell lung cancer when combined with the existing prior knowledge on relevant signaling pathways.The new proposed general prior construction framework extends the prior construction methodology to a more flexible framework that results in better inference when proper prior knowledge exists. Moreover, the extension of optimal Bayesian classification to multinomial mixtures where data sets are both small and unlabeled, enables superior classifier design using small, unstructured data sets. We have demonstrated the effectiveness of our approach using pathway information and available knowledge of gene regulating functions; however, the underlying theory can be applied to a wide variety of knowledge types, and other applications when there are small samples.
View details for PubMedID 29297278
Development and Validation of Biopsy-Free Genotyping for Molecular Subtyping of Diffuse Large B-Cell Lymphoma
58th Annual Meeting and Exposition of the American-Society-of-Hematology
AMER SOC HEMATOLOGY. 2016
View details for Web of Science ID 000394446803093
Distinct biological subtypes and patterns of genome evolution in lymphoma revealed by circulating tumor DNA
SCIENCE TRANSLATIONAL MEDICINE
2016; 8 (364)
Patients with diffuse large B cell lymphoma (DLBCL) exhibit marked diversity in tumor behavior and outcomes, yet the identification of poor-risk groups remains challenging. In addition, the biology underlying these differences is incompletely understood. We hypothesized that characterization of mutational heterogeneity and genomic evolution using circulating tumor DNA (ctDNA) profiling could reveal molecular determinants of adverse outcomes. To address this hypothesis, we applied cancer personalized profiling by deep sequencing (CAPP-Seq) analysis to tumor biopsies and cell-free DNA samples from 92 lymphoma patients and 24 healthy subjects. At diagnosis, the amount of ctDNA was found to strongly correlate with clinical indices and was independently predictive of patient outcomes. We demonstrate that ctDNA genotyping can classify transcriptionally defined tumor subtypes, including DLBCL cell of origin, directly from plasma. By simultaneously tracking multiple somatic mutations in ctDNA, our approach outperformed immunoglobulin sequencing and radiographic imaging for the detection of minimal residual disease and facilitated noninvasive identification of emergent resistance mutations to targeted therapies. In addition, we identified distinct patterns of clonal evolution distinguishing indolent follicular lymphomas from those that transformed into DLBCL, allowing for potential noninvasive prediction of histological transformation. Collectively, our results demonstrate that ctDNA analysis reveals biological factors that underlie lymphoma clinical outcomes and could facilitate individualized therapy.
View details for DOI 10.1126/scitranslmed.aai8545
View details for PubMedID 27831904
Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients
Circulating tumour DNA (ctDNA) analysis facilitates studies of tumour heterogeneity. Here we employ CAPP-Seq ctDNA analysis to study resistance mechanisms in 43 non-small cell lung cancer (NSCLC) patients treated with the third-generation epidermal growth factor receptor (EGFR) inhibitor rociletinib. We observe multiple resistance mechanisms in 46% of patients after treatment with first-line inhibitors, indicating frequent intra-patient heterogeneity. Rociletinib resistance recurrently involves MET, EGFR, PIK3CA, ERRB2, KRAS and RB1. We describe a novel EGFR L798I mutation and find that EGFR C797S, which arises in ∼33% of patients after osimertinib treatment, occurs in <3% after rociletinib. Increased MET copy number is the most frequent rociletinib resistance mechanism in this cohort and patients with multiple pre-existing mechanisms (T790M and MET) experience inferior responses. Similarly, rociletinib-resistant xenografts develop MET amplification that can be overcome with the MET inhibitor crizotinib. These results underscore the importance of tumour heterogeneity in NSCLC and the utility of ctDNA-based resistance mechanism assessment.
View details for DOI 10.1038/ncomms11815
View details for PubMedID 27283993
- Noninvasive Cancer Classification Using Diverse Genomic Features in Circulating Tumor DNA ASSOC COMPUTING MACHINERY. 2016: 516
An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
2015; 12 (6): 1304-1321
Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
View details for DOI 10.1109/TCBB.2015.2424407
View details for Web of Science ID 000368292400011
View details for PubMedID 26671803
- Discrete optimal Bayesian classification with error-conditioned sequential sampling PATTERN RECOGNITION 2015; 48 (11): 3766-3782
- Incorporation of Biological Pathway Knowledge in the Construction of Priors for Optimal Bayesian Classification IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11 (1): 202-218
Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification
2013; 46 (10): 2783-2797
Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely-used models for the uncertainty classes; ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers that use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/Publications/supplementary/shahrokh12a.
View details for DOI 10.1016/j.patcog.2013.02.017
View details for Web of Science ID 000320477400014
View details for PubMedCentralID PMC4535735
Identification and Analysis of the First 2009 Pandemic H1N1 Influenza Virus from US Feral Swine
ZOONOSES AND PUBLIC HEALTH
2013; 60 (5): 327-335
The first case of pandemic H1N1 influenza (pH1N1) virus in feral swine in the United States was identified in Texas through the United States Department of Agriculture (USDA) Wildlife Services' surveillance program. Two samples were identified as pandemic influenza by reverse transcriptase quantitative PCR (RT-qPCR). Full-genome Sanger sequencing of all eight influenza segments was performed. In addition, Illumina deep sequencing of the original diagnostic samples and their respective virus isolation cultures were performed to assess the feasibility of using an unbiased whole-genome linear target amplification method and multiple sample sequencing in a single Illumina GAIIx lane. Identical sequences were obtained using both techniques. Phylogenetic analysis indicated that all gene segments belonged to the pH1N1 (2009) lineage. In conclusion, we have identified the first pH1N1 isolate in feral swine in the United States and have demonstrated the use of an easy unbiased linear amplification method for deep sequencing of multiple samples.
View details for DOI 10.1111/zph.12006
View details for Web of Science ID 000321666200002
View details for PubMedID 22978260
Effect of Separate Sampling on Classification and the Minimax Criterion
IEEE. 2013: 72–73
View details for Web of Science ID 000350564300023
Probabilistic reconstruction of the tumor progression process in gene regulatory networks in the presence of uncertainty
8th Annual Conference of the MidSouth-Computational-Biology-and-Bioinformatics-Society (MCBIOS)
BIOMED CENTRAL LTD. 2011
Accumulation of gene mutations in cells is known to be responsible for tumor progression, driving it from benign states to malignant states. However, previous studies have shown that the detailed sequence of gene mutations, or the steps in tumor progression, may vary from tumor to tumor, making it difficult to infer the exact path that a given type of tumor may have taken.In this paper, we propose an effective probabilistic algorithm for reconstructing the tumor progression process based on partial knowledge of the underlying gene regulatory network and the steady state distribution of the gene expression values in a given tumor. We take the BNp (Boolean networks with pertubation) framework to model the gene regulatory networks. We assume that the true network is not exactly known but we are given an uncertainty class of networks that contains the true network. This network uncertainty class arises from our partial knowledge of the true network, typically represented as a set of local pathways that are embedded in the global network. Given the SSD of the cancerous network, we aim to simultaneously identify the true normal (healthy) network and the set of gene mutations that drove the network into the cancerous state. This is achieved by analyzing the effect of gene mutation on the SSD of a gene regulatory network. At each step, the proposed algorithm reduces the uncertainty class by keeping only those networks whose SSDs get close enough to the cancerous SSD as a result of additional gene mutation. These steps are repeated until we can find the best candidate for the true network and the most probable path of tumor progression.Simulation results based on both synthetic networks and networks constructed from actual pathway knowledge show that the proposed algorithm can identify the normal network and the actual path of tumor progression with high probability. The algorithm is also robust to model mismatch and allows us to control the trade-off between efficiency and accuracy.
View details for DOI 10.1186/1471-2105-12-S10-S9
View details for Web of Science ID 000303933600009
View details for PubMedID 22166046
View details for PubMedCentralID PMC3236852