With a primary focus on high-dimensional data, I have significant expertise in developing machine learning tools. Much of my work involves constructing Bayesian models, which effectively convert 'prior knowledge', either inherent in the dataset or obtained from external sources, into mathematical terms—more specifically, prior probabilities.

My recent research efforts have centered on the analysis of genetic and epigenetic signals within cell-free DNA assays. This interest in epigenetics led to the development of a pioneering technique known as EPIC-seq, which has broadened our understanding of this complex field.

It's notable that traditional computational methods in cancer genomics often fall short when confronted with an exceedingly low signal-to-noise ratio—a common scenario in cfDNA analyses. As such, there's an emerging need to devise innovative, robust methods capable of overcoming this limitation—a research area that I'm deeply committed to and actively engaged in.

Academic Appointments

Honors & Awards

  • T32 Training Program Fellowship, Stanford Medicine (Department of Radiation Oncology) (09/2018-)
  • Cancer Systems Biology Program Fellowship (NIH-R25), Stanford University (09/2015-08/2017)
  • NCI Speaker/Travel Award, NCI (Systems Analysis of Cancer Biology conference) (04/2016)

Professional Education

  • PhD, Texas A&M University, Electrical Engineering, Machine Learning (2014)
  • MSc, Sharif University of Technology, Iran, Electrical Engineering (2009)
  • BSc, University of Tehran, Iran, Electrical Engineering (2007)


  • Mohammad Shahrokh Esfahani. "United StatesMethods to Assess Clinical Outcome Based Upon Updated Probabilities and Treatments Thereof"

Research Interests

  • Data Sciences

All Publications

  • Distinct Hodgkin lymphoma subtypes defined by noninvasive genomic profiling. Nature Alig, S. K., Esfahani, M. S., Garofalo, A., Li, M. Y., Rossi, C., Flerlage, T., Flerlage, J. E., Adams, R., Binkley, M. S., Shukla, N., Jin, M. C., Olsen, M., Telenius, A., Mutter, J. A., Schroers-Martin, J. G., Sworder, B. J., Rai, S., King, D. A., Schultz, A., Bögeholz, J., Su, S., Kathuria, K. R., Liu, C. L., Kang, X., Strohband, M. J., Langfitt, D., Pobre-Piza, K. F., Surman, S., Tian, F., Spina, V., Tousseyn, T., Buedts, L., Hoppe, R., Natkunam, Y., Fornecker, L. M., Castellino, S. M., Advani, R., Rossi, D., Lynch, R., Ghesquières, H., Casasnovas, O., Kurtz, D. M., Marks, L. J., Link, M. P., André, M., Vandenberghe, P., Steidl, C., Diehn, M., Alizadeh, A. A. 2023


    The scarcity of malignant Hodgkin and Reed-Sternberg (HRS) cells hamper tissue-based comprehensive genomic profiling of classic Hodgkin lymphoma (cHL). Liquid biopsies, in contrast, show promise for molecular profiling of cHL due to relatively high circulating tumor DNA (ctDNA) levels1-4. Here, we show that the plasma representation of mutations exceeds the bulk tumor representation in most cases, making cHL particularly amenable to noninvasive profiling. Leveraging single-cell transcriptional profiles of cHL tumors, we demonstrate HRS ctDNA shedding to be shaped by DNASE1L3, whose increased tumor microenvironment-derived expression drives high ctDNA concentrations. Using this insight, we comprehensively profile 366 patients, revealing two distinct cHL genomic subtypes with characteristic clinical and prognostic correlates, as well as distinct transcriptional and immunological profiles. Furthermore, we identify a novel class of truncating IL4R-mutations that are dependent on IL13 signaling and therapeutically targetable with IL4R blocking antibodies. Finally, using PhasED-Seq5 we demonstrate the clinical value of pre- and on-treatment ctDNA levels for longitudinally refining cHL risk prediction, and for detection of radiographically occult minimal residual disease. Collectively, these results support the utility of noninvasive strategies for genotyping and dynamic monitoring of cHL as well as capturing molecularly distinct subtypes with diagnostic, prognostic, and therapeutic potential.

    View details for DOI 10.1038/s41586-023-06903-x

    View details for PubMedID 38081297

  • Inferring gene expression from cell-free DNA fragmentation profiles. Nature biotechnology Esfahani, M. S., Hamilton, E. G., Mehrmohamadi, M., Nabet, B. Y., Alig, S. K., King, D. A., Steen, C. B., Macaulay, C. W., Schultz, A., Nesselbush, M. C., Soo, J., Schroers-Martin, J. G., Chen, B., Binkley, M. S., Stehr, H., Chabon, J. J., Sworder, B. J., Hui, A. B., Frank, M. J., Moding, E. J., Liu, C. L., Newman, A. M., Isbell, J. M., Rudin, C. M., Li, B. T., Kurtz, D. M., Diehn, M., Alizadeh, A. A. 2022


    Profiling of circulating tumor DNA (ctDNA) in the bloodstream shows promise for noninvasive cancer detection. Chromatin fragmentation features have previously been explored to infer gene expression profiles from cell-free DNA (cfDNA), but current fragmentomic methods require high concentrations of tumor-derived DNA and provide limited resolution. Here we describe promoter fragmentation entropy as an epigenomic cfDNA feature that predicts RNA expression levels at individual genes. We developed 'epigenetic expression inference from cell-free DNA-sequencing' (EPIC-seq), a method that uses targeted sequencing of promoters of genes of interest. Profiling 329 blood samples from 201 patients with cancer and 87 healthy adults, we demonstrate classification of subtypes of lung carcinoma and diffuse large B cell lymphoma. Applying EPIC-seq to serial blood samples from patients treated with PD-(L)1 immune-checkpoint inhibitors, we show that gene expression profiles inferred by EPIC-seq are correlated with clinical response. Our results indicate that EPIC-seq could enable noninvasive, high-throughput tissue-of-origin characterization with diagnostic, prognostic and therapeutic potential.

    View details for DOI 10.1038/s41587-022-01222-4

    View details for PubMedID 35361996

  • Integrating genomic features for non-invasive early lung cancer detection NATURE Chabon, J. J., Hamilton, E. G., Kurtz, D. M., Esfahani, M. S., Moding, E. J., Stehr, H., Schroers-Martin, J., Nabet, B. Y., Chen, B., Chaudhuri, A. A., Liu, C., Hui, A. B., Jin, M. C., Azad, T. D., Almanza, D., Jeon, Y., Nesselbush, M. C., Keh, L., Bonilla, R. F., Yoo, C. H., Ko, R. B., Chen, E. L., Merriott, D. J., Massion, P. P., Mansfield, A. S., Jen, J., Ren, H. Z., Lin, S. H., Costantino, C. L., Burr, R., Tibshirani, R., Gambhir, S. S., Berry, G. J., Jensen, K. C., West, R. B., Neal, J. W., Wakelee, H. A., Loo, B. W., Kunder, C. A., Leung, A. N., Lui, N. S., Berry, M. F., Shrager, J. B., Nair, V. S., Haber, D. A., Sequist, L. V., Alizadeh, A. A., Diehn, M. 2020
  • Noninvasive Early Identification of Therapeutic Benefit from Immune Checkpoint Inhibition. Cell Nabet, B. Y., Esfahani, M. S., Moding, E. J., Hamilton, E. G., Chabon, J. J., Rizvi, H. n., Steen, C. B., Chaudhuri, A. A., Liu, C. L., Hui, A. B., Almanza, D. n., Stehr, H. n., Gojenola, L. n., Bonilla, R. F., Jin, M. C., Jeon, Y. J., Tseng, D. n., Liu, C. n., Merghoub, T. n., Neal, J. W., Wakelee, H. A., Padda, S. K., Ramchandran, K. J., Das, M. n., Plodkowski, A. J., Yoo, C. n., Chen, E. L., Ko, R. B., Newman, A. M., Hellmann, M. D., Alizadeh, A. A., Diehn, M. n. 2020


    Although treatment of non-small cell lung cancer (NSCLC) with immune checkpoint inhibitors (ICIs) can produce remarkably durable responses, most patients develop early disease progression. Furthermore, initial response assessment by conventional imaging is often unable to identify which patients will achieve durable clinical benefit (DCB). Here, we demonstrate that pre-treatment circulating tumor DNA (ctDNA) and peripheral CD8 T cell levels are independently associated with DCB. We further show that ctDNA dynamics after a single infusion can aid in identification of patients who will achieve DCB. Integrating these determinants, we developed and validated an entirely noninvasive multiparameter assay (DIREct-On, Durable Immunotherapy Response Estimation by immune profiling and ctDNA-On-treatment) that robustly predicts which patients will achieve DCB with higher accuracy than any individual feature. Taken together, these results demonstrate that integrated ctDNA and circulating immune cell profiling can provide accurate, noninvasive, and early forecasting of ultimate outcomes for NSCLC patients receiving ICIs.

    View details for DOI 10.1016/j.cell.2020.09.001

    View details for PubMedID 33007267

  • Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction. Cell Kurtz, D. M., Esfahani, M. S., Scherer, F., Soo, J., Jin, M. C., Liu, C. L., Newman, A. M., Duhrsen, U., Huttmann, A., Casasnovas, O., Westin, J. R., Ritgen, M., Bottcher, S., Langerak, A. W., Roschewski, M., Wilson, W. H., Gaidano, G., Rossi, D., Bahlo, J., Hallek, M., Tibshirani, R., Diehn, M., Alizadeh, A. A. 2019


    Accurate prediction of long-term outcomes remains a challenge in the care of cancer patients. Due to the difficulty of serial tumor sampling, previous prediction tools have focused on pretreatment factors. However, emerging non-invasive diagnostics have increased opportunities for serial tumor assessments. We describe the Continuous Individualized Risk Index (CIRI), a method to dynamically determine outcome probabilities for individual patients utilizing risk predictors acquired over time. Similar to "win probability" models in other fields, CIRI provides a real-time probability by integrating risk assessments throughout a patient's course. Applying CIRI to patients with diffuse large B cell lymphoma, we demonstrate improved outcome prediction compared to conventional risk models. We demonstrate CIRI's broader utility in analogous models of chronic lymphocytic leukemia and breast adenocarcinoma and perform a proof-of-concept analysis demonstrating how CIRI could be used to develop predictive biomarkers for therapy selection. We envision thatdynamic risk assessment will facilitate personalized medicine and enable innovative therapeutic paradigms.

    View details for DOI 10.1016/j.cell.2019.06.011

    View details for PubMedID 31280963

  • Functional significance of U2AF1 S34F mutations in lung adenocarcinomas. Nature communications Esfahani, M. S., Lee, L. J., Jeon, Y. J., Flynn, R. A., Stehr, H. n., Hui, A. B., Ishisoko, N. n., Kildebeck, E. n., Newman, A. M., Bratman, S. V., Porteus, M. H., Chang, H. Y., Alizadeh, A. A., Diehn, M. n. 2019; 10 (1): 5712


    The functional role of U2AF1 mutations in lung adenocarcinomas (LUADs) remains incompletely understood. Here, we report a significant co-occurrence of U2AF1 S34F mutations with ROS1 translocations in LUADs. To characterize this interaction, we profiled effects of S34F on the transcriptome-wide distribution of RNA binding and alternative splicing in cells harboring the ROS1 translocation. Compared to its wild-type counterpart, U2AF1 S34F preferentially binds and modulates splicing of introns containing CAG trinucleotides at their 3' splice junctions. The presence of S34F caused a shift in cross-linking at 3' splice sites, which was significantly associated with alternative splicing of skipped exons. U2AF1 S34F induced expression of genes involved in the epithelial-mesenchymal transition (EMT) and increased tumor cell invasion. Finally, S34F increased splicing of the long over the short SLC34A2-ROS1 isoform, which was also associated with enhanced invasiveness. Taken together, our results suggest a mechanistic interaction between mutant U2AF1 and ROS1 in LUAD.

    View details for DOI 10.1038/s41467-019-13392-y

    View details for PubMedID 31836708

  • Effect of separate sampling on classification accuracy. Bioinformatics Shahrokh Esfahani, M., Dougherty, E. R. 2014; 30 (2): 242-250


    Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples.We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier.All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at:

    View details for DOI 10.1093/bioinformatics/btt662

    View details for PubMedID 24257187

  • Determinants of resistance to engineered T cell therapies targeting CD19 in large B cell lymphomas. Cancer cell Sworder, B. J., Kurtz, D. M., Alig, S. K., Frank, M. J., Shukla, N., Garofalo, A., Macaulay, C. W., Shahrokh Esfahani, M., Olsen, M. N., Hamilton, J., Hosoya, H., Hamilton, M., Spiegel, J. Y., Baird, J. H., Sugio, T., Carleton, M., Craig, A. F., Younes, S. F., Sahaf, B., Sheybani, N. D., Schroers-Martin, J. G., Liu, C. L., Oak, J. S., Jin, M. C., Beygi, S., Hüttmann, A., Hanoun, C., Dührsen, U., Westin, J. R., Khodadoust, M. S., Natkunam, Y., Majzner, R. G., Mackall, C. L., Diehn, M., Miklos, D. B., Alizadeh, A. A. 2022


    Most relapsed/refractory large B cell lymphoma (r/rLBCL) patients receiving anti-CD19 chimeric antigen receptor (CAR19) T cells relapse. To characterize determinants of resistance, we profiled over 700 longitudinal specimens from two independent cohorts (n = 65 and n = 73) of r/rLBCL patients treated with axicabtagene ciloleucel. A method for simultaneous profiling of circulating tumor DNA (ctDNA), cell-free CAR19 (cfCAR19) retroviral fragments, and cell-free T cell receptor rearrangements (cfTCR) enabled integration of tumor and both engineered and non-engineered T cell effector-mediated factors for assessing treatment failure and predicting outcomes. Alterations in multiple classes of genes are associated with resistance, including B cell identity (PAX5 and IRF8), immune checkpoints (CD274), and those affecting the microenvironment (TMEM30A). Somatic tumor alterations affect CAR19 therapy at multiple levels, including CAR19 T cell expansion, persistence, and tumor microenvironment. Further, CAR19 T cells play a reciprocal role in shaping tumor genotype and phenotype. We envision these findings will facilitate improved chimeric antigen receptor (CAR) T cells and personalized therapeutic approaches.

    View details for DOI 10.1016/j.ccell.2022.12.005

    View details for PubMedID 36584673

  • Circulating Tumor DNA Profiling for Detection, Risk Stratification, and Classification of Brain Lymphomas. Journal of clinical oncology : official journal of the American Society of Clinical Oncology Mutter, J. A., Alig, S. K., Esfahani, M. S., Lauer, E. M., Mitschke, J., Kurtz, D. M., Kühn, J., Bleul, S., Olsen, M., Liu, C. L., Jin, M. C., Macaulay, C. W., Neidert, N., Volk, T., Eisenblaetter, M., Rauer, S., Heiland, D. H., Finke, J., Duyster, J., Wehrle, J., Prinz, M., Illerhaus, G., Reinacher, P. C., Schorb, E., Diehn, M., Alizadeh, A. A., Scherer, F. 2022: JCO2200826


    Clinical outcomes of patients with CNS lymphomas (CNSLs) are remarkably heterogeneous, yet identification of patients at high risk for treatment failure is challenging. Furthermore, CNSL diagnosis often remains unconfirmed because of contraindications for invasive stereotactic biopsies. Therefore, improved biomarkers are needed to better stratify patients into risk groups, predict treatment response, and noninvasively identify CNSL.We explored the value of circulating tumor DNA (ctDNA) for early outcome prediction, measurable residual disease monitoring, and surgery-free CNSL identification by applying ultrasensitive targeted next-generation sequencing to a total of 306 tumor, plasma, and CSF specimens from 136 patients with brain cancers, including 92 patients with CNSL.Before therapy, ctDNA was detectable in 78% of plasma and 100% of CSF samples. Patients with positive ctDNA in pretreatment plasma had significantly shorter progression-free survival (PFS, P < .0001, log-rank test) and overall survival (OS, P = .0001, log-rank test). In multivariate analyses including established clinical and radiographic risk factors, pretreatment plasma ctDNA concentrations were independently prognostic of clinical outcomes (PFS HR, 1.4; 95% CI, 1.0 to 1.9; P = .03; OS HR, 1.6; 95% CI, 1.1 to 2.2; P = .006). Moreover, measurable residual disease detection by plasma ctDNA monitoring during treatment identified patients with particularly poor prognosis following curative-intent immunochemotherapy (PFS, P = .0002; OS, P = .004, log-rank test). Finally, we developed a proof-of-principle machine learning approach for biopsy-free CNSL identification from ctDNA, showing sensitivities of 59% (CSF) and 25% (plasma) with high positive predictive value.We demonstrate robust and ultrasensitive detection of ctDNA at various disease milestones in CNSL. Our findings highlight the role of ctDNA as a noninvasive biomarker and its potential value for personalized risk stratification and treatment guidance in patients with CNSL.

    View details for DOI 10.1200/JCO.22.00826

    View details for PubMedID 36542815

  • Genomic Profiling of Bronchoalveolar Lavage Fluid in Lung Cancer. Cancer research Nair, V. S., Hui, A. B., Chabon, J. J., Shahrokh Esfahani, M., Stehr, H., Nabet, B. Y., Zhou, L., Chaudhuri, A. A., Benson, J. A., Ayers, K., Bedi, H., Ramsey, M. C., Van Wert, R., Antic, S., Lui, N. S., Backhus, L. M., Berry, M. F., Sung, A. W., Massion, P. P., Shrager, J. B., Alizadeh, A. A., Diehn, M. 2022


    Genomic profiling of Bronchoalveolar Lavage (BAL) samples may be useful for tumor profiling and diagnosis in the clinic. Here, we compared tumor-derived mutations detected in BAL samples from subjects with non-small cell lung cancer (NSCLC) to those detected in matched plasma samples. CAncer Personalized Profiling by deep Sequencing (CAPP-Seq) was used to genotype DNA purified from BAL, plasma and tumor samples from patients with NSCLC. The characteristics of cell-free DNA (cfDNA) isolated from BAL fluid were first characterized to optimize the technical approach. Somatic mutations identified in tumor were then compared to those identified in BAL and plasma, and the potential of BAL cfDNA analysis to distinguish lung cancer patients from risk-matched controls was explored. In total, 200 biofluid and tumor samples from 38 cases and 21 controls undergoing BAL for lung cancer evaluation were profiled. More tumor variants were identified in BAL cfDNA than plasma cfDNA in all stages (p<0.001) and in stage I-II disease only. Four of 21 controls harbored low levels of cancer-associated driver mutations in BAL cfDNA (mean VAF=0.5%), suggesting the presence of somatic mutations in non-malignant airway cells. Finally, using a Random Forest model with leave-one-out cross validation, an exploratory BAL genomic classifier identified lung cancer with 69% sensitivity and 100% specificity in this cohort and detected more cancers than BAL cytology. Detecting tumor-derived mutations by targeted sequencing of BAL cfDNA is technically feasible and appears to be more sensitive than plasma profiling. Further studies are required to define optimal diagnostic applications and clinical utility.

    View details for DOI 10.1158/0008-5472.CAN-22-0554

    View details for PubMedID 35748739

  • Enhanced detection of minimal residual disease by targeted sequencing of phased variants in circulating tumor DNA. Nature biotechnology Kurtz, D. M., Soo, J., Co Ting Keh, L., Alig, S., Chabon, J. J., Sworder, B. J., Schultz, A., Jin, M. C., Scherer, F., Garofalo, A., Macaulay, C. W., Hamilton, E. G., Chen, B., Olsen, M., Schroers-Martin, J. G., Craig, A. F., Moding, E. J., Esfahani, M. S., Liu, C. L., Duhrsen, U., Huttmann, A., Casasnovas, R., Westin, J. R., Roschewski, M., Wilson, W. H., Gaidano, G., Rossi, D., Diehn, M., Alizadeh, A. A. 2021


    Circulating tumor-derived DNA (ctDNA) is an emerging biomarker for many cancers, but the limited sensitivity of current detection methods reduces its utility for diagnosing minimal residual disease. Here we describe phased variant enrichment and detection sequencing (PhasED-seq), a method that uses multiple somatic mutations in individual DNA fragments to improve the sensitivity of ctDNA detection. Leveraging whole-genome sequences from 2,538 tumors, we identify phased variants and their associations with mutational signatures. We show that even without molecular barcodes, the limits of detection of PhasED-seq outperform prior methods, including duplex barcoding, allowing ctDNA detection in the ppm range in participant samples. We profiled 678 specimens from 213 participants with B cell lymphomas, including serial cell-free DNA samples before and during therapy for diffuse large B cell lymphoma. In participants with undetectable ctDNA after two cycles of therapy using a next-generation sequencing-based approach termed cancer personalized profiling by deep sequencing, an additional 25% have ctDNA detectable by PhasED-seq and have worse outcomes. Finally, we demonstrate the application of PhasED-seq to solid tumors.

    View details for DOI 10.1038/s41587-021-00981-w

    View details for PubMedID 34294911

  • Short Diagnosis-to-Treatment Interval Is Associated With Higher Circulating Tumor DNA Levels in Diffuse Large B-Cell Lymphoma. Journal of clinical oncology : official journal of the American Society of Clinical Oncology Alig, S. n., Macaulay, C. W., Kurtz, D. M., Dührsen, U. n., Hüttmann, A. n., Schmitz, C. n., Jin, M. C., Sworder, B. J., Garofalo, A. n., Shahrokh Esfahani, M. n., Nabet, B. Y., Soo, J. n., Scherer, F. n., Craig, A. F., Casasnovas, O. n., Westin, J. R., Gaidano, G. n., Rossi, D. n., Roschewski, M. n., Wilson, W. H., Meignan, M. n., Diehn, M. n., Alizadeh, A. A. 2021: JCO2002573


    Patients with Diffuse Large B-cell Lymphoma (DLBCL) in need of immediate therapy are largely under-represented in clinical trials. The diagnosis-to-treatment interval (DTI) has recently been described as a metric to quantify such patient selection bias, with short DTI being associated with adverse risk factors and inferior outcomes. Here, we characterized the relationships between DTI, circulating tumor DNA (ctDNA), conventional risk factors, and clinical outcomes, with the goal of defining objective disease metrics contributing to selection bias.We evaluated pretreatment ctDNA levels in 267 patients with DLBCL treated across multiple centers in Europe and the United States using Cancer Personalized Profiling by Deep Sequencing. Pretreatment ctDNA levels were correlated with DTI, total metabolic tumor volumes (TMTVs), the International Prognostic Index (IPI), and outcome.Short DTI was associated with advanced-stage disease (P < .001) and higher IPI (P < .001). We also found an inverse correlation between DTI and TMTV (RS= -0.37; P < .001). Similarly, pretreatment ctDNA levels were significantly associated with stage, IPI, and TMTV (all P < .001), demonstrating that both DTI and ctDNA reflect disease burden. Notably, patients with shorter DTI had higher pretreatment ctDNA levels (P < .001). Pretreatment ctDNA levels predicted short DTI independent of the IPI (P < .001). Although each risk factor was significantly associated with event-free survival in univariable analysis, ctDNA level was prognostic of event-free survival independent of DTI and IPI in multivariable Cox regression (ctDNA: hazard ratio, 1.5; 95% CI [1.2 to 2.0]; IPI: 1.1 [0.9 to 1.3]; -DTI: 1.1 [1.0 to 1.2]).Short DTI largely reflects baseline tumor burden, which can be objectively measured using pretreatment ctDNA levels. Pretreatment ctDNA levels therefore have utility for quantifying and guarding against selection biases in prospective DLBCL clinical trials.

    View details for DOI 10.1200/JCO.20.02573

    View details for PubMedID 33909455

  • The landscape of tumor cell states and ecosystems in diffuse large B cell lymphoma. Cancer cell Steen, C. B., Luca, B. A., Esfahani, M. S., Azizi, A., Sworder, B. J., Nabet, B. Y., Kurtz, D. M., Liu, C. L., Khameneh, F., Advani, R. H., Natkunam, Y., Myklebust, J. H., Diehn, M., Gentles, A. J., Newman, A. M., Alizadeh, A. A. 2021


    Biological heterogeneity in diffuse large B cell lymphoma (DLBCL) is partly driven by cell-of-origin subtypes and associated genomic lesions, but also by diverse cell types and cell states in the tumor microenvironment (TME). However, dissecting these cell states and their clinical relevance at scale remains challenging. Here, we implemented EcoTyper, a machine-learning framework integrating transcriptome deconvolution and single-cell RNA sequencing, to characterize clinically relevant DLBCL cell states and ecosystems. Using this approach, we identified five cell states of malignant B cells that vary in prognostic associations and differentiation status. We also identified striking variation in cell states for 12 other lineages comprising the TME and forming cell state interactions in stereotyped ecosystems. While cell-of-origin subtypes have distinct TME composition, DLBCL ecosystems capture clinical heterogeneity within existing subtypes and extend beyond cell-of-origin and genotypic classes. These results resolve the DLBCL microenvironment at systems-level resolution and identify opportunities for therapeutic targeting (

    View details for DOI 10.1016/j.ccell.2021.08.011

    View details for PubMedID 34597589

  • Chromatin accessibility patterns in cell-free DNA reveal tumor heterogeneity Esfahani, M., Mehrmohamadi, M., Steen, C. B., Hamilton, E. G., King, D. A., Soo, J., Macaulay, C., Jin, M., Kurtz, D. M., Nabet, B., Moding, E., Chabon, J., Newman, A., Diehn, M., Alizadeh, A. A. AMER ASSOC CANCER RESEARCH. 2020
  • A mid-chemoradiation dynamic risk model integrating tumor features and ctDNA analysis for lung cancer outcome prediction. Moding, E. J., Esfahani, M., Nabet, B., Liu, Y., Chabon, J. J., He, J., Qiao, Y., Xu, T., Yao, L., Gandhi, S., Liao, Z. X., Das, M., Ramchandran, K., Padda, S., Neal, J. W., Wakelee, H. A., Loo, B. W., Lin, S. H., Alizadeh, A. A., Diehn, M. AMER SOC CLINICAL ONCOLOGY. 2020
  • Evaluating upfront high-dose consolidation after R-CHOP for follicular lymphoma by clinical and genetic risk models. Blood advances Alig, S. n., Jurinovic, V. n., Shahrokh Esfahani, M. n., Haebe, S. n., Passerini, V. n., Hellmuth, J. C., Gaitzsch, E. n., Keay, W. n., Tahiri, N. n., Zoellner, A. n., Rosenwald, A. n., Klapper, W. n., Stein, H. n., Feller, A. n., Ott, G. n., Staiger, A. M., Horn, H. n., Hansmann, M. L., Pott, C. n., Unterhalt, M. n., Schmidt, C. n., Dreyling, M. n., Alizadeh, A. A., Hiddemann, W. n., Hoster, E. n., Weigert, O. n. 2020; 4 (18): 4451–62


    High-dose therapy and autologous stem cell transplantation (HDT/ASCT) is an effective salvage treatment for eligible patients with follicular lymphoma (FL) and early progression of disease (POD). Since the introduction of rituximab, HDT/ASCT is no longer recommended in first remission. We here explored whether consolidative HDT/ASCT improved survival in defined subgroups of previously untreated patients. We report survival analyses of 431 patients who received frontline rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) for advanced FL, and were randomized to receive consolidative HDT/ASCT. We performed targeted genotyping of 157 diagnostic biopsies, and calculated genotype-based risk scores. HDT/ASCT improved failure-free survival (FFS; hazard ratio [HR], 0.8, P = .07; as-treated: HR, 0.7, P = .04), but not overall survival (OS; HR, 1.3, P = .27; as-treated: HR, 1.4, P = .13). High-risk cohorts identified by FL International Prognostic Index (FLIPI), and the clinicogenetic risk models m7-FLIPI and POD within 24 months-prognostic index (POD24-PI) comprised 27%, 18%, and 22% of patients. HDT/ASCT did not significantly prolong FFS in high-risk patients as defined by FLIPI (HR, 0.9; P = .56), m7-FLIPI (HR, 0.9; P = .91), and POD24-PI (HR, 0.8; P = .60). Similarly, OS was not significantly improved. Finally, we used a machine-learning approach to predict benefit from HDT/ASCT by genotypes. Patients predicted to benefit from HDT/ASCT had longer FFS with HDT/ASCT (HR, 0.4; P = .03), but OS did not reach statistical significance. Thus, consolidative HDT/ASCT after frontline R-CHOP did not improve OS in unselected FL patients and subgroups selected by genotype-based risk models.

    View details for DOI 10.1182/bloodadvances.2020002546

    View details for PubMedID 32941649

  • An Atlas of Clinically-Distinct Tumor Cellular Ecosystems in Diffuse Large B Cell Lymphoma Steen, C. B., Luca, B. A., Esfahani, M., Nabet, B. Y., Sworder, B., Farshidfar, F., Shamardani, K., Kurtz, D. M., Liu, C., Advani, R. H., Natkunam, Y., Myklebust, J., Diehn, M., Gentles, A., Newman, A. M., Alizadeh, A. A. AMER SOC HEMATOLOGY. 2019
  • Broad Genomic Profiling of Bronchoalveolar Lavage Fluid in Lung Cancer Nair, V., Hui, A., Chabon, J., Esfahani, M., Stehr, H., Nabet, B., Benson, J., Chaudhuri, A., Zhou, L., Ayers, K., Bedi, H., Ramsey, M., Van Wert, R., Sung, A., Lui, N., Backhus, L., Berry, M., Massion, P., Shrager, J., Alizadeh, A., Diehn, M. ELSEVIER SCIENCE INC. 2019: S747–S748
  • Validated Limited Gene Predictor For Cervical Cancer Lymph Node Metastases Bloomstein, J., Von Eyben, R., Rankin, E., Wang-Chiang, J., David, S., Esfahani, M., Kidd, E. A. ELSEVIER SCIENCE INC. 2019: S50
  • Determining cell type abundance and expression from bulk tissues with digital cytometry NATURE BIOTECHNOLOGY Newman, A. M., Steen, C. B., Liu, C., Gentles, A. J., Chaudhuri, A. A., Scherer, F., Khodadoust, M. S., Esfahani, M. S., Luca, B. A., Steiner, D., Diehn, M., Alizadeh, A. A. 2019; 37 (7): 773-+
  • Detection and Surveillance of Bladder Cancer Using Urine Tumor DNA CANCER DISCOVERY Dudley, J. C., Schroers-Martin, J., Lazzareschi, D., Shi, W., Chen, S. B., Esfahani, M. S., Trivedi, D., Chabon, J. J., Chaudhuri, A. A., Stehr, H., Liu, C., Lim, H., Costa, H. A., Nabet, B. Y., Sin, M. Y., Liao, J. C., Alizadeh, A. A., Diehn, M. 2019; 9 (4): 500–509
  • Circulating DNA for Molecular Response Prediction, Characterization of Resistance Mechanisms and Quantification of CAR T-Cells during Axicabtagene Ciloleucel Therapy American Society of Hematology Sworder, B., Kurtz, D. M., Macaulay, C., Frank, M. J., Alig, S., Garofalo, A., Sahaf, B., Esfahani, M. S., Spiegel, J. Y., Oak, J., Beygi, S., Jin, M. C., Chabon, J. J., Khodadoust, M. S., Majzner, R. G., Mackall, C. L., Diehn, M., Miklos, D. B., Alizadeh, A. A. 2019
  • Circulating tumor DNA analysis for detection of minimal residual disease after chemoradiotherapy for localized esophageal cancer. Gastroenterology Azad, T. D., Chaudhuri, A. A., Fang, P. n., Qiao, Y. n., Esfahani, M. S., Chabon, J. J., Hamilton, E. G., Yang, Y. D., Lovejoy, A. n., Newman, A. M., Kurtz, D. M., Jin, M. n., Schroers-Martin, J. n., Stehr, H. n., Liu, C. L., Bik-Yu Hui, A. n., Patel, V. n., Maru, D. n., Lin, S. H., Alizadeh, A. A., Diehn, M. n. 2019


    Biomarkers are needed to identify patients at risk of tumor progression following chemoradiotherapy for localized esophageal cancer. These could improve identification of patients at risk for cancer progression and selection of therapy.We performed deep sequencing (CAPP-Seq) analyses of plasma cell-free DNA collected from 45 patients before and after chemoradiotherapy for esophageal cancer, as well as DNA from leukocytes, and fixed esophageal tumor biopsies collected during esophagogastroduodenoscopy. Patients were treated from May 2010 through October 2015; 23 patients subsequently underwent esophagectomy and 22 did not undergo surgery. We also sequenced DNA from blood samples from 40 healthy individuals (controls). We analyzed 802 regions of 607 genes for single-nucleotide variants previously associated with esophageal adenocarcinoma or squamous cell carcinoma. Patients underwent imaging analyses 6-8 weeks after chemoradiotherapy and were followed for 5 years. Our primary aim was to determine whether detection of circulating tumor DNA (ctDNA) following chemoradiotherapy is associated with risk of tumor progression (growth of local, regional, or distant tumors, detected by imaging or biopsy).The median proportion of tumor-derived DNA in total cell-free DNA before treatment was 0.07%, indicating that ultrasensitive assays are needed for quantification and analysis of ctDNA from localized esophageal tumors. Detection of ctDNA following chemoradiotherapy was associated with tumor progression (hazard ratio, 18.7; P<.0001), formation of distant metastases (hazard ratio, 32.1; P<.0001), and shorter disease-specific survival times (hazard ratio, 23.1; P<.0001). A higher proportion of patients with tumor progression had new mutations detected in plasma samples collected after chemoradiotherapy than patients without progression (P=.03). Detection of ctDNA after chemoradiotherapy preceded radiographic evidence of tumor progression by an average of 2.8 months. Among patients who received chemoradiotherapy without surgery, combined ctDNA and metabolic imaging analysis predicted progression in 100% of patients with tumor progression, compared with 71% for only ctDNA detection and 57% for only metabolic imaging analysis (P<.001 for comparison of either technique to combined analysis).In an analysis of cell-free DNA in blood samples from patients who underwent chemoradiotherapy for esophageal cancer, detection of ctDNA was associated with tumor progression, metastasis, and disease-specific survival. Analysis of ctDNA might be used to identify patients at highest risk for tumor progression.

    View details for DOI 10.1053/j.gastro.2019.10.039

    View details for PubMedID 31711920

  • Towards Non-Invasive Classification of DLBCL Genetic Subtypes By Ctdna Profiling American Society of Hematology Esfahani, M. S., Alig, S., Kurtz, D. M., Soo, J., Jin, M. C., Macaulay, C., Craig, A., Garofalo, A., Steen, C. B., Scherer, F., Sworder, B., Diehn, M., Alizadeh, A. A. 2019
  • An experimental design framework for Markovian gene regulatory networks under stationary control policy Dehghannasiri, R., Esfahani, M., Dougherty, E. R. BMC. 2018
  • An experimental design framework for Markovian gene regulatory networks under stationary control policy. BMC systems biology Dehghannasiri, R., Shahrokh Esfahani, M., Dougherty, E. R. 2018; 12 (Suppl 8): 137


    BACKGROUND: A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty.RESULTS: In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an "intrinsically Bayesian robust" (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) - but only in experimental design, the final policy being an IBR policy.CONCLUSIONS: Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.

    View details for PubMedID 30577732

  • Distinct Chromatin Accessibility Profiles of Lymphoma Subtypes Revealed By Targeted Cell Free DNA Profiling Mehrmohamadi, M., Esfahani, M. S., Soo, J., Scherer, F., Schroers-Martin, J. G., Chen, B., Kurtz, D. M., Hamilton, E., Liu, C., Diehn, M., Alizadeh, A. A. AMER SOC HEMATOLOGY. 2018
  • Noninvasive Genotyping and Monitoring of Classical Hodgkin Lymphoma Jin, M. C., Schroers-Martin, J. G., Kurtz, D. M., Buedts, L., Esfahani, M. S., Macaulay, C., Sworder, B., Soo, J., Glover, C., Roschewski, M., Wilson, W. H., Duhrsen, U., Huettmann, A., Rossi, D., Gaidano, G., Westin, J. R., Maeda, L. S., Advani, R. H., Vandenberghe, P., Diehn, M., Alizadeh, A. A. AMER SOC HEMATOLOGY. 2018
  • Circulating Tumor DNA Measurements As Early Outcome Predictors in Diffuse Large B-Cell Lymphoma JOURNAL OF CLINICAL ONCOLOGY Kurtz, D. M., Scherer, F., Jin, M. C., Soo, J., Craig, A. M., Esfahani, M., Chabon, J. J., Stehr, H., Liu, C., Tibshirani, R., Maeda, L. S., Gupta, N. K., Khodadoust, M. S., Advani, R. H., Levy, R., Newman, A. M., Duehrsen, U., Huettmann, A., Meignan, M., Casasnovas, R., Westin, J. R., Roschewski, M., Wilson, W. H., Gaidano, G., Rossi, D., Diehn, M., Alizadeh, A. A. 2018; 36 (28): 2845-+
  • Circulating Tumor DNA Measurements As Early Outcome Predictors in Diffuse Large B-Cell Lymphoma. Journal of clinical oncology : official journal of the American Society of Clinical Oncology Kurtz, D. M., Scherer, F., Jin, M. C., Soo, J., Craig, A. F., Esfahani, M. S., Chabon, J. J., Stehr, H., Liu, C. L., Tibshirani, R., Maeda, L. S., Gupta, N. K., Khodadoust, M. S., Advani, R. H., Levy, R., Newman, A. M., Duhrsen, U., Huttmann, A., Meignan, M., Casasnovas, R., Westin, J. R., Roschewski, M., Wilson, W. H., Gaidano, G., Rossi, D., Diehn, M., Alizadeh, A. A. 2018: JCO2018785246


    Purpose Outcomes for patients with diffuse large B-cell lymphoma remain heterogeneous, with existing methods failing to consistently predict treatment failure. We examined the additional prognostic value of circulating tumor DNA (ctDNA) before and during therapy for predicting patient outcomes. Patients and Methods We studied the dynamics of ctDNA from 217 patients treated at six centers, using a training and validation framework. We densely characterized early ctDNA dynamics during therapy using cancer personalized profiling by deep sequencing to define response-associated thresholds within a discovery set. These thresholds were assessed in two independent validation sets. Finally, we assessed the prognostic value of ctDNA in the context of established risk factors, including the International Prognostic Index and interim positron emission tomography/computed tomography scans. Results Before therapy, ctDNA was detectable in 98% of patients; pretreatment levels were prognostic in both front-line and salvage settings. In the discovery set, ctDNA levels changed rapidly, with a 2-log decrease after one cycle (early molecular response [EMR]) and a 2.5-log decrease after two cycles (major molecular response [MMR]) stratifying outcomes. In the first validation set, patients receiving front-line therapy achieving EMR or MMR had superior outcomes at 24 months (EMR: EFS, 83% v 50%; P = .0015; MMR: EFS, 82% v 46%; P < .001). EMR also predicted superior 24-month outcomes in patients receiving salvage therapy in the first validation set (EFS, 100% v 13%; P = .011). The prognostic value of EMR and MMR was further confirmed in the second validation set. In multivariable analyses including International Prognostic Index and interim positron emission tomography/computed tomography scans across both cohorts, molecular response was independently prognostic of outcomes, including event-free and overall survival. Conclusion Pretreatment ctDNA levels and molecular responses are independently prognostic of outcomes in aggressive lymphomas. These risk factors could potentially guide future personalized risk-directed approaches.

    View details for PubMedID 30125215

  • Optimal Bayesian Kalman Filtering With Prior Update IEEE TRANSACTIONS ON SIGNAL PROCESSING Dehghannasiri, R., Esfahani, M., Qian, X., Dougherty, E. R. 2018; 66 (8): 1982–96
  • Detection and surveillance of bladder cancer using urine tumor DNA. Cancer discovery Dudley, J. C., Schroers-Martin, J. n., Lazzareschi, D. V., Shi, W. Y., Chen, S. B., Esfahani, M. S., Trivedi, D. n., Chabon, J. J., Chaudhuri, A. A., Stehr, H. n., Liu, C. L., Lim, H. n., Costa, H. A., Nabet, B. Y., Sin, M. L., Liao, J. C., Alizadeh, A. A., Diehn, M. n. 2018


    Current regimens for the detection and surveillance of bladder cancer (BLCA) are invasive and have suboptimal sensitivity. Here, we present a novel high-throughput sequencing (HTS) method for detection of urine tumor DNA (utDNA) called utDNA CAPP-Seq (uCAPP-Seq) and apply it to 67 healthy adults and 118 patients with early-stage BLCA who either had urine collected prior to treatment or during surveillance. Using this targeted sequencing approach, we detected a median of 6 mutations per BLCA patient and observed surprisingly frequent mutations of the PLEKHS1 promoter (46%), suggesting these mutations represent a useful biomarker for detection of BLCA. We detected utDNA pre-treatment in 93% of cases using a tumor mutation-informed approach and in 84% when blinded to tumor mutation status, with 96-100% specificity. In the surveillance setting, we detected utDNA in 91% of patients who ultimately recurred, with utDNA detection preceding clinical progression in 92% of cases. uCAPP-Seq outperformed a commonly used ancillary test (UroVysion, p=0.02) and cytology and cystoscopy combined (p is less than or equal to 0.006), detecting 100% of BLCA cases detected by cytology and 82% that cytology missed. Our results indicate that uCAPP-Seq is a promising approach for early detection and surveillance of BLCA.

    View details for PubMedID 30578357

  • Clinical Impact of Somatic Copy Number Alterations in Circulating Tumor DNA from Diverse Lymphoma Subtypes Jin, M., Kurtz, D. M., Esfahani, M. S., Soo, J., Craig, A., Scherer, F., Stehr, H., Schroers-Martin, J. G., Bangs, C., Cherry, A., Natkunam, Y., Roschewski, M., Wilson, W. H., Duehrsen, U., Huttmann, A., Rossi, D., Gaidano, G., Westin, J. R., Advani, R. H., Diehn, M., Alizadeh, A. A. AMER SOC HEMATOLOGY. 2017
  • Constructing Pathway-based Priors Within a Gaussian Mixture Model for Bayesian Regression and Classification. IEEE/ACM transactions on computational biology and bioinformatics Boluki, S., Shahrokh Esfahani, M., Qian, X., Dougherty, E. R. 2017


    Gene-expression-based classification and regression are major concerns in translational genomics. If the feature-label distribution is known, then an optimal classifier can be derived. If the regressor-target distribution is known, then an optimal regression function can be derived. In practice, neither is known, data must be employed, and, for small samples, prior knowledge concerning the feature-label or regressor-target distribution can be used in the learning process. Optimal Bayesian classification and optimal Bayesian regression provide optimality under uncertainty. With optimal Bayesian classification (or regression), uncertainty is treated directly on the feature-label (or regressor-target) distribution. The fundamental engineering problem is prior construction. The Regularized Expected Mean Log-Likelihood Prior (REML) utilizes pathway information and provides viable priors for the feature-label distribution, assuming that the training data contain labels. In practice, the labels may not be observed. This paper extends the REML methodology to a Gaussian mixture model (GMM) when the labels are unknown. Prior construction bundled with prior update via Bayesian sampling results in Monte Carlo approximations to the optimal Bayesian regression function and optimal Bayesian classifier. Simulations demonstrate that the GMM REML prior yields better performance than the EM algorithm for small data sets. We apply it to phenotype classification when the prior knowledge consists of colon cancer pathways.

    View details for DOI 10.1109/TCBB.2017.2778715

    View details for PubMedID 29990066

  • Noninvasive detection of clinically relevant copy number alterations in diffuse large B-cell lymphoma. Jin, M. C., Kurtz, D., Esfahani, M., Scherer, F., Craig, A. M., Soo, J., Khodadoust, M., Saganty, R., Chabon, J. J., Schroers-Martin, J., Stehr, H., Advani, R. H., Rossi, D., Gaidano, G., Westin, J. R., Diehn, M., Alizadeh, A. A. AMER SOC CLINICAL ONCOLOGY. 2017
  • Intrinsically Bayesian Robust Kalman Filter: An Innovation Process Approach IEEE TRANSACTIONS ON SIGNAL PROCESSING Dehghannasiri, R., Esfahani, M. S., Dougherty, E. R. 2017; 65 (10): 2531-2546
  • Early detection of molecular residual disease in localized lung cancer by circulating tumor DNA profiling. Cancer discovery Chaudhuri, A. A., Chabon, J. J., Lovejoy, A. F., Newman, A. M., Stehr, H. n., Azad, T. D., Khodadoust, M. S., Esfahani, M. S., Liu, C. L., Zhou, L. n., Scherer, F. n., Kurtz, D. M., Say, C. n., Carter, J. N., Merriott, D. J., Dudley, J. C., Binkley, M. S., Modlin, L. n., Padda, S. K., Gensheimer, M. F., West, R. B., Shrager, J. B., Neal, J. W., Wakelee, H. A., Loo, B. W., Alizadeh, A. A., Diehn, M. n. 2017


    Identifying molecular residual disease (MRD) after treatment of localized lung cancer could facilitate early intervention and personalization of adjuvant therapies. Here we apply Cancer Personalized Profiling by Deep Sequencing (CAPP-Seq) circulating tumor DNA (ctDNA) analysis to 255 samples from 40 patients treated with curative intent for stage I-III lung cancer and 54 healthy adults. In 94% of evaluable patients experiencing recurrence, ctDNA was detectable in the first post-treatment blood sample, indicating reliable identification of MRD. Post-treatment ctDNA detection preceded radiographic progression in 72% of patients by a median of 5.2 months and 53% of patients harbored ctDNA mutation profiles associated with favorable responses to tyrosine kinase inhibitors or immune checkpoint blockade. Collectively, these results indicate that ctDNA MRD in lung cancer patients can be accurately detected using CAPP-Seq and may allow personalized adjuvant treatment while disease burden is lowest.

    View details for PubMedID 28899864

  • Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors. BMC bioinformatics Boluki, S. n., Esfahani, M. S., Qian, X. n., Dougherty, E. R. 2017; 18 (Suppl 14): 552


    Phenotypic classification is problematic because small samples are ubiquitous; and, for these, use of prior knowledge is critical. If knowledge concerning the feature-label distribution - for instance, genetic pathways - is available, then it can be used in learning. Optimal Bayesian classification provides optimal classification under model uncertainty. It differs from classical Bayesian methods in which a classification model is assumed and prior distributions are placed on model parameters. With optimal Bayesian classification, uncertainty is treated directly on the feature-label distribution, which assures full utilization of prior knowledge and is guaranteed to outperform classical methods.The salient problem confronting optimal Bayesian classification is prior construction. In this paper, we propose a new prior construction methodology based on a general framework of constraints in the form of conditional probability statements. We call this prior the maximal knowledge-driven information prior (MKDIP). The new constraint framework is more flexible than our previous methods as it naturally handles the potential inconsistency in archived regulatory relationships and conditioning can be augmented by other knowledge, such as population statistics. We also extend the application of prior construction to a multinomial mixture model when labels are unknown, which often occurs in practice. The performance of the proposed methods is examined on two important pathway families, the mammalian cell-cycle and a set of p53-related pathways, and also on a publicly available gene expression dataset of non-small cell lung cancer when combined with the existing prior knowledge on relevant signaling pathways.The new proposed general prior construction framework extends the prior construction methodology to a more flexible framework that results in better inference when proper prior knowledge exists. Moreover, the extension of optimal Bayesian classification to multinomial mixtures where data sets are both small and unlabeled, enables superior classifier design using small, unstructured data sets. We have demonstrated the effectiveness of our approach using pathway information and available knowledge of gene regulating functions; however, the underlying theory can be applied to a wide variety of knowledge types, and other applications when there are small samples.

    View details for PubMedID 29297278

  • Development and Validation of Biopsy-Free Genotyping for Molecular Subtyping of Diffuse Large B-Cell Lymphoma 58th Annual Meeting and Exposition of the American-Society-of-Hematology Scherer, F., Kurtz, D. M., Newman, A. M., Esfahani, M. S., Craig, A., Stehr, H., Lovejoy, A. F., Chabon, J. J., Liu, C. L., Zhou, L., Glover, C., Visser, B. C., Poultsides, G., Advani, R. H., Maeda, L. S., Gupta, N. K., Levy, R., Ohgami, R. S., Davis, E. R., Gaidano, G., Kunder, C. A., Rossi, D., Westin, J. R., Diehn, M., Alizadeh, A. A. AMER SOC HEMATOLOGY. 2016
  • Distinct biological subtypes and patterns of genome evolution in lymphoma revealed by circulating tumor DNA SCIENCE TRANSLATIONAL MEDICINE Scherer, F., Kurtz, D. M., Newman, A. M., Stehr, H., Craig, A. F., Esfahani, M. S., Lovejoy, A. F., Chabon, J. J., Klass, D. M., Liu, C. L., Zhou, L., Glover, C., Visser, B. C., Poultsides, G. A., Advani, R. H., Maeda, L. S., Gupta, N. K., Levy, R., Ohgami, R. S., Kunder, C. A., Diehn, M., Alizadeh, A. A. 2016; 8 (364)


    Patients with diffuse large B cell lymphoma (DLBCL) exhibit marked diversity in tumor behavior and outcomes, yet the identification of poor-risk groups remains challenging. In addition, the biology underlying these differences is incompletely understood. We hypothesized that characterization of mutational heterogeneity and genomic evolution using circulating tumor DNA (ctDNA) profiling could reveal molecular determinants of adverse outcomes. To address this hypothesis, we applied cancer personalized profiling by deep sequencing (CAPP-Seq) analysis to tumor biopsies and cell-free DNA samples from 92 lymphoma patients and 24 healthy subjects. At diagnosis, the amount of ctDNA was found to strongly correlate with clinical indices and was independently predictive of patient outcomes. We demonstrate that ctDNA genotyping can classify transcriptionally defined tumor subtypes, including DLBCL cell of origin, directly from plasma. By simultaneously tracking multiple somatic mutations in ctDNA, our approach outperformed immunoglobulin sequencing and radiographic imaging for the detection of minimal residual disease and facilitated noninvasive identification of emergent resistance mutations to targeted therapies. In addition, we identified distinct patterns of clonal evolution distinguishing indolent follicular lymphomas from those that transformed into DLBCL, allowing for potential noninvasive prediction of histological transformation. Collectively, our results demonstrate that ctDNA analysis reveals biological factors that underlie lymphoma clinical outcomes and could facilitate individualized therapy.

    View details for DOI 10.1126/scitranslmed.aai8545

    View details for PubMedID 27831904

  • Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients NATURE COMMUNICATIONS Chabon, J. J., Simmons, A. D., Lovejoy, A. F., Esfahani, M. S., Newman, A. M., Haringsma, H. J., Kurtz, D. M., Stehr, H., Scherer, F., Karlovich, C. A., Harding, T. C., Durkin, K. A., Otterson, G. A., Purcell, W. T., Camidge, D. R., Goldman, J. W., Sequist, L. V., Piotrowska, Z., Wakelee, H. A., Neal, J. W., Alizadeh, A. A., Diehn, M. 2016; 7


    Circulating tumour DNA (ctDNA) analysis facilitates studies of tumour heterogeneity. Here we employ CAPP-Seq ctDNA analysis to study resistance mechanisms in 43 non-small cell lung cancer (NSCLC) patients treated with the third-generation epidermal growth factor receptor (EGFR) inhibitor rociletinib. We observe multiple resistance mechanisms in 46% of patients after treatment with first-line inhibitors, indicating frequent intra-patient heterogeneity. Rociletinib resistance recurrently involves MET, EGFR, PIK3CA, ERRB2, KRAS and RB1. We describe a novel EGFR L798I mutation and find that EGFR C797S, which arises in ∼33% of patients after osimertinib treatment, occurs in <3% after rociletinib. Increased MET copy number is the most frequent rociletinib resistance mechanism in this cohort and patients with multiple pre-existing mechanisms (T790M and MET) experience inferior responses. Similarly, rociletinib-resistant xenografts develop MET amplification that can be overcome with the MET inhibitor crizotinib. These results underscore the importance of tumour heterogeneity in NSCLC and the utility of ctDNA-based resistance mechanism assessment.

    View details for DOI 10.1038/ncomms11815

    View details for PubMedID 27283993

  • Noninvasive Cancer Classification Using Diverse Genomic Features in Circulating Tumor DNA Esfahani, M., Newman, A. M., Scherer, F., Tibshirani, R., Diehn, M., Alizadeh, A. A., ACM ASSOC COMPUTING MACHINERY. 2016: 516
  • An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS Esfahani, M. S., Dougherty, E. R. 2015; 12 (6): 1304-1321


    Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.

    View details for DOI 10.1109/TCBB.2015.2424407

    View details for Web of Science ID 000368292400011

    View details for PubMedID 26671803

  • Discrete optimal Bayesian classification with error-conditioned sequential sampling PATTERN RECOGNITION Broumand, A., Esfahani, M. S., Yoon, B., Dougherty, E. R. 2015; 48 (11): 3766-3782
  • Incorporation of Biological Pathway Knowledge in the Construction of Priors for Optimal Bayesian Classification IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS Esfahani, M. S., Dougherty, E. R. 2014; 11 (1): 202-218
  • Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification PATTERN RECOGNITION Esfahani, M. S., Knight, J., Zollanvari, A., Yoon, B., Dougherty, E. R. 2013; 46 (10): 2783-2797


    Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely-used models for the uncertainty classes; ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers that use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website:

    View details for DOI 10.1016/j.patcog.2013.02.017

    View details for Web of Science ID 000320477400014

    View details for PubMedCentralID PMC4535735

  • Identification and Analysis of the First 2009 Pandemic H1N1 Influenza Virus from US Feral Swine ZOONOSES AND PUBLIC HEALTH Clavijo, A., Nikooienejad, A., Esfahani, M. S., Metz, R. P., Schwartz, S., Atashpaz-Gargari, E., DeLiberto, T. J., Lutman, M. W., Pedersen, K., Bazan, L. R., KOSTER, L. G., Jenkins-Moore, M., Swenson, S. L., Zhang, M., Beckham, T., Johnson, C. D., Bounpheng, M. 2013; 60 (5): 327-335


    The first case of pandemic H1N1 influenza (pH1N1) virus in feral swine in the United States was identified in Texas through the United States Department of Agriculture (USDA) Wildlife Services' surveillance program. Two samples were identified as pandemic influenza by reverse transcriptase quantitative PCR (RT-qPCR). Full-genome Sanger sequencing of all eight influenza segments was performed. In addition, Illumina deep sequencing of the original diagnostic samples and their respective virus isolation cultures were performed to assess the feasibility of using an unbiased whole-genome linear target amplification method and multiple sample sequencing in a single Illumina GAIIx lane. Identical sequences were obtained using both techniques. Phylogenetic analysis indicated that all gene segments belonged to the pH1N1 (2009) lineage. In conclusion, we have identified the first pH1N1 isolate in feral swine in the United States and have demonstrated the use of an easy unbiased linear amplification method for deep sequencing of multiple samples.

    View details for DOI 10.1111/zph.12006

    View details for Web of Science ID 000321666200002

    View details for PubMedID 22978260

  • Effect of Separate Sampling on Classification and the Minimax Criterion Esfahani, M., Dougherty, E. R., IEEE IEEE. 2013: 72–73
  • Probabilistic reconstruction of the tumor progression process in gene regulatory networks in the presence of uncertainty 8th Annual Conference of the MidSouth-Computational-Biology-and-Bioinformatics-Society (MCBIOS) Esfahani, M. S., Yoon, B., Dougherty, E. R. BIOMED CENTRAL LTD. 2011


    Accumulation of gene mutations in cells is known to be responsible for tumor progression, driving it from benign states to malignant states. However, previous studies have shown that the detailed sequence of gene mutations, or the steps in tumor progression, may vary from tumor to tumor, making it difficult to infer the exact path that a given type of tumor may have taken.In this paper, we propose an effective probabilistic algorithm for reconstructing the tumor progression process based on partial knowledge of the underlying gene regulatory network and the steady state distribution of the gene expression values in a given tumor. We take the BNp (Boolean networks with pertubation) framework to model the gene regulatory networks. We assume that the true network is not exactly known but we are given an uncertainty class of networks that contains the true network. This network uncertainty class arises from our partial knowledge of the true network, typically represented as a set of local pathways that are embedded in the global network. Given the SSD of the cancerous network, we aim to simultaneously identify the true normal (healthy) network and the set of gene mutations that drove the network into the cancerous state. This is achieved by analyzing the effect of gene mutation on the SSD of a gene regulatory network. At each step, the proposed algorithm reduces the uncertainty class by keeping only those networks whose SSDs get close enough to the cancerous SSD as a result of additional gene mutation. These steps are repeated until we can find the best candidate for the true network and the most probable path of tumor progression.Simulation results based on both synthetic networks and networks constructed from actual pathway knowledge show that the proposed algorithm can identify the normal network and the actual path of tumor progression with high probability. The algorithm is also robust to model mismatch and allows us to control the trade-off between efficiency and accuracy.

    View details for DOI 10.1186/1471-2105-12-S10-S9

    View details for Web of Science ID 000303933600009

    View details for PubMedID 22166046

    View details for PubMedCentralID PMC3236852