I am a bioinformatics scientist working in cancer immunotherapy. Originally trained as a pharmacist, I then did a PhD in machine learning applied to cancer drug screenings. My main areas of interest are: cancer immunotherapy and drug discovery. I am currently developing a reverse translational framework for drug discovery in DLBCL and building an interaction network to capture tumor microenvironment signaling and cell cell interactions. My overarching goal is to develop new immunotherapies for cancer and make the drug development process more efficient, rational and data driven.

Professional Education

  • Master of Science, Unlisted School (2015)
  • Doctor of Pharmacy, Universite De Paris Xi (Paris-Sud) (2015)
  • Doctor of Philosophy, Ruprecht Karl Universitat Heidelberg (2018)

Stanford Advisors

Lab Affiliations

All Publications

  • Community Assessment of the Predictability of Cancer Protein and Phosphoprotein Levels from Genomics and Transcriptomics CELL SYSTEMS Yang, M., Petralia, F., Li, Z., Li, H., Ma, W., Song, X., Kim, S., Lee, H., Yu, H., Lee, B., Bae, S., Heo, E., Kaczmarczyk, J., Stepniak, P., Warchol, M., Yu, T., Calinawan, A. P., Boutros, P. C., Payne, S. H., Reva, B., Boja, E., Rodriguez, H., Stolovitzky, G., Guan, Y., Kang, J., Wang, P., Fenyo, D., Saez-Rodriguez, J., NCI-CPTAC-DREAM Consortium 2020; 11 (2): 186-+


    Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge. We asked for methods to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. The best performance was achieved by an ensemble of models, including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types, and phosphosite proximity for phosphorylation prediction. Proteins from metabolic pathways and complexes were the best and worst predicted, respectively. The performance of even the best-performing model was modest, suggesting that many proteins are strongly regulated through translational control and degradation. Our results set a reference for the limitations of computational inference in proteogenomics. A record of this paper's transparent peer review process is included in the Supplemental Information.

    View details for DOI 10.1016/j.cels.2020.06.013

    View details for Web of Science ID 000563112000007

    View details for PubMedID 32710834

  • Stratification and prediction of drug synergy based on target functional similarity. NPJ systems biology and applications Yang, M., Jaaks, P., Dry, J., Garnett, M., Menden, M. P., Saez-Rodriguez, J. 2020; 6 (1): 16


    Drug combinations can expand therapeutic options and address cancer's resistance. However, the combinatorial space is enormous precluding its systematic exploration. Therefore, synergy prediction strategies are essential. We here present an approach to prioritise drug combinations in high-throughput screens and to stratify synergistic responses. At the core of our approach is the observation that the likelihood of synergy increases when targeting proteins with either strong functional similarity or dissimilarity. We estimate the similarity applying a multitask machine learning approach to basal gene expression and response to single drugs. We tested 7 protein target pairs (representing 29 combinations) and predicted their synergies in 33 breast cancer cell lines. In addition, we experimentally validated predicted synergy of the BRAF/insulin receptor combination (Dabrafenib/BMS-754807) in 48 colorectal cancer cell lines. We anticipate that our approaches can be used for prioritization of drug combinations in large scale screenings, and to maximize the efficacy of drugs already known to induce synergy, ultimately enabling patient stratification.

    View details for DOI 10.1038/s41540-020-0136-x

    View details for PubMedID 32487991

  • Multi-omic measurements of heterogeneity in HeLa cells across laboratories NATURE BIOTECHNOLOGY Liu, Y., Mi, Y., Mueller, T., Kreibich, S., Williams, E. G., Van Drogen, A., Borel, C., Franks, M., Germain, P., Bludau, I., Mehnert, M., Seifert, M., Emmenlauer, M., Sorg, I., Bezrukov, F., Bena, F., Zhou, H., Dehio, C., Testa, G., Saez-Rodriguez, J., Antonarakis, S. E., Hardt, W., Aebersold, R. 2019; 37 (3): 314-+


    Reproducibility in research can be compromised by both biological and technical variation, but most of the focus is on removing the latter. Here we investigate the effects of biological variation in HeLa cell lines using a systems-wide approach. We determine the degree of molecular and phenotypic variability across 14 stock HeLa samples from 13 international laboratories. We cultured cells in uniform conditions and profiled genome-wide copy numbers, mRNAs, proteins and protein turnover rates in each cell line. We discovered substantial heterogeneity between HeLa variants, especially between lines of the CCL2 and Kyoto varieties, and observed progressive divergence within a specific cell line over 50 successive passages. Genomic variability has a complex, nonlinear effect on transcriptome, proteome and protein turnover profiles, and proteotype patterns explain the varying phenotypic response of different cell lines to Salmonella infection. These findings have implications for the interpretation and reproducibility of research results obtained from human cultured cells.

    View details for DOI 10.1038/s41587-019-0037-y

    View details for Web of Science ID 000460155900023

    View details for PubMedID 30778230

  • Linking drug target and pathway activation for effective therapy using multi-task learning SCIENTIFIC REPORTS Yang, M., Simm, J., Lam, C., Zakeri, P., van Westen, G. P., Moreau, Y., Saez-Rodriguez, J. 2018; 8: 8322


    Despite the abundance of large-scale molecular and drug-response data, the insights gained about the mechanisms underlying treatment efficacy in cancer has been in general limited. Machine learning algorithms applied to those datasets most often are used to provide predictions without interpretation, or reveal single drug-gene association and fail to derive robust insights. We propose to use Macau, a bayesian multitask multi-relational algorithm to generalize from individual drugs and genes and explore the interactions between the drug targets and signaling pathways' activation. A typical insight would be: "Activation of pathway Y will confer sensitivity to any drug targeting protein X". We applied our methodology to the Genomics of Drug Sensitivity in Cancer (GDSC) screening, using gene expression of 990 cancer cell lines, activity scores of 11 signaling pathways derived from the tool PROGENy as cell line input and 228 nominal targets for 265 drugs as drug input. These interactions can guide a tissue-specific combination treatment strategy, for example suggesting to modulate a certain pathway to maximize the drug response for a given tissue. We confirmed in literature drug combination strategies derived from our result for brain, skin and stomach tissues. Such an analysis of interactions across tissues might help target discovery, drug repurposing and patient stratification strategies.

    View details for DOI 10.1038/s41598-018-25947-y

    View details for Web of Science ID 000433291300025

    View details for PubMedID 29844324

    View details for PubMedCentralID PMC5974390

  • CELLector: Genomics-Guided Selection of Cancer In Vitro Models Cell Systems Najgebauer, H., YANG, M. 2020: 424–32.e6


    Selecting appropriate cancer models is a key prerequisite for maximizing translational potential and clinical relevance of in vitro oncology studies. We developed CELLector: an R package and R Shiny application allowing researchers to select the most relevant cancer cell lines in a patient-genomic-guided fashion. CELLector leverages tumor genomics to identify recurrent subtypes with associated genomic signatures. It then evaluates these signatures in cancer cell lines to prioritize their selection. This enables users to choose appropriate in vitro models for inclusion or exclusion in retrospective analyses and future studies. Moreover, this allows bridging outcomes from cancer cell line screens to precisely defined sub-cohorts of primary tumors. Here, we demonstrate the usefulness and applicability of CELLector, showing how it can aid prioritization of in vitro models for future development and unveil patient-derived multivariate prognostic and therapeutic markers. CELLector is freely available at (code at and

    View details for DOI 10.1016/j.cels.2020.04.007

  • In silico Prioritization of Transporter-Drug Relationships From Drug Sensitivity Screens FRONTIERS IN PHARMACOLOGY Cesar-Razquin, A., Girardi, E., Yang, M., Brehme, M., Saez-Rodriguez, J., Superti-Furga, G. 2018; 9: 1011


    The interplay between drugs and cell metabolism is a key factor in determining both compound potency and toxicity. In particular, how and to what extent transmembrane transporters affect drug uptake and disposition is currently only partially understood. Most transporter proteins belong to two protein families: the ATP-Binding Cassette (ABC) transporter family, whose members are often involved in xenobiotic efflux and drug resistance, and the large and heterogeneous family of solute carriers (SLCs). We recently argued that SLCs are collectively a rather neglected gene group, with most of its members still poorly characterized, and thus likely to include many yet-to-be-discovered associations with drugs. We searched publicly available resources and literature to define the currently known set of drugs transported by ABCs or SLCs, which involved ∼500 drugs and more than 100 transporters. In order to extend this set, we then mined the largest publicly available pharmacogenomics dataset, which involves approximately 1,000 molecularly annotated cancer cell lines and their response to 265 anti-cancer compounds, and used regularized linear regression models (Elastic Net, LASSO) to predict drug responses based on SLC and ABC data (expression levels, SNVs, CNVs). The most predictive models included both known and previously unidentified associations between drugs and transporters. To our knowledge, this represents the first application of regularized linear regression to this set of genes, providing an extensive prioritization of potentially pharmacologically interesting interactions.

    View details for DOI 10.3389/fphar.2018.01011

    View details for Web of Science ID 000443993400001

    View details for PubMedID 30245630

    View details for PubMedCentralID PMC6137680

  • Genomic Determinants of Protein Abundance Variation in Colorectal Cancer Cells CELL REPORTS Roumeliotis, T. I., Williams, S. P., Goncalves, E., Alsinet, C., Velasco-Herrera, M., Aben, N., Ghavidel, F., Michaut, M., Schubert, M., Price, S., Wright, J. C., Yu, L., Yang, M., Dienstmann, R., Guinney, J., Beltrao, P., Brazma, A., Pardo, M., Stegle, O., Adams, D. J., Wessels, L., Saez-Rodriguez, J., McDermott, U., Choudhary, J. S. 2017; 20 (9): 2201–14


    Assessing the impact of genomic alterations on protein networks is fundamental in identifying the mechanisms that shape cancer heterogeneity. We have used isobaric labeling to characterize the proteomic landscapes of 50 colorectal cancer cell lines and to decipher the functional consequences of somatic genomic variants. The robust quantification of over 9,000 proteins and 11,000 phosphopeptides on average enabled the de novo construction of a functional protein correlation network, which ultimately exposed the collateral effects of mutations on protein complexes. CRISPR-cas9 deletion of key chromatin modifiers confirmed that the consequences of genomic alterations can propagate through protein interactions in a transcript-independent manner. Lastly, we leveraged the quantified proteome to perform unsupervised classification of the cell lines and to build predictive models of drug response in colorectal cancer. Overall, we provide a deep integrative view of the functional network and the molecular structure underlying the heterogeneity of colorectal cancer cells.

    View details for DOI 10.1016/j.celrep.2017.08.010

    View details for Web of Science ID 000408585000016

    View details for PubMedID 28854368

    View details for PubMedCentralID PMC5583477

  • Looking beyond the cancer cell for effective drug combinations GENOME MEDICINE Dry, J. R., Yang, M., Saez-Rodriguez, J. 2016; 8: 125


    Combinations of therapies are being actively pursued to expand therapeutic options and deal with cancer's pervasive resistance to treatment. Research efforts to discover effective combination treatments have focused on drugs targeting intracellular processes of the cancer cells and in particular on small molecules that target aberrant kinases. Accordingly, most of the computational methods used to study, predict, and develop drug combinations concentrate on these modes of action and signaling processes within the cancer cell. This focus on the cancer cell overlooks significant opportunities to tackle other components of tumor biology that may offer greater potential for improving patient survival. Many alternative strategies have been developed to combat cancer; for example, targeting different cancer cellular processes such as epigenetic control; modulating stromal cells that interact with the tumor; strengthening physical barriers that confine tumor growth; boosting the immune system to attack tumor cells; and even regulating the microbiome to support antitumor responses. We suggest that to fully exploit these treatment modalities using effective drug combinations it is necessary to develop multiscale computational approaches that take into account the full complexity underlying the biology of a tumor, its microenvironment, and a patient's response to the drugs. In this Opinion article, we discuss preliminary work in this area and the needs-in terms of both computational and data requirements-that will truly empower such combinations.

    View details for DOI 10.1186/s13073-016-0379-8

    View details for Web of Science ID 000389006800001

    View details for PubMedID 27887656

    View details for PubMedCentralID PMC5124246