Honors & Awards

  • Dartmouth Graduate Student Travel Awards, Dartmouth College (2019)
  • Dartmouth College MCB student fellowship, Dartmouth College (2016)
  • Brandeis University MCB student fellowship, Brandeis University (2015)
  • The Excellent Graduate Student of Southern Medical University, Southern Medical University (2013)
  • The Excellent Student of Southern Medical University, Southern Medical University (2013)
  • The Excellent Students’ Union Cadre of Southern Medical University, Southern Medical University (2013)
  • National University Innovation Research and Training Program, Southern Medical University (2012)
  • University Innovation Research and Training Program of Guangdong Province, Southern Medical University (2011)
  • “Excellent Student” Scholarship, Southern Medical University (2011)
  • The First Prize in Biochemistry Experiment Skills Contest, Southern Medical University (2010)

Professional Education

  • Doctor of Philosophy, Dartmouth College (2020)
  • Bachelor of Science, Unlisted School (2013)
  • Master of Science, Brandeis University (2015)
  • BS, Southern Medical University, Pharmacology (2013)
  • MSc, Brandeis University, Molecular and Cellular Biology (2015)
  • PhD, Dartmouth College, Genetics (2020)

Stanford Advisors

Lab Affiliations

All Publications

  • Cooperation of chromatin remodeling SWI/SNF complex and pioneer factor AP-1 shapes 3D enhancer landscapes. Nature structural & molecular biology Wolf, B. K., Zhao, Y., McCray, A., Hawk, W. H., Deary, L. T., Sugiarto, N. W., LaCroix, I. S., Gerber, S. A., Cheng, C., Wang, X. 2022


    The mechanism controlling the dynamic targeting of SWI/SNF has long been postulated to be coordinated by transcription factors (TFs), yet demonstrating a specific TF influence has proven difficult. Here we take a multi-omics approach to interrogate transient SWI/SNF interactors, chromatin targeting and the resulting three-dimensional epigenetic landscape. We utilize the labeling technique TurboID to map the SWI/SNF interactome and identify the activator protein-1 (AP-1) family members as critical interacting partners for SWI/SNF complexes. CUT&RUN profiling demonstrates SWI/SNF targeting enrichment at AP-1 bound loci, as well as SWI/SNF-AP-1 cooperation in chromatin targeting. HiChIP reveals AP-1-SWI/SNF-dependent restructuring of the three-dimensional promoter-enhancer architecture and generation of enhancer hubs. Through interrogation of the SWI/SNF-AP-1 interaction, we demonstrate an SWI/SNF dependency on AP-1-mediated chromatin localization. We propose that pioneer factors, such as AP-1, bind and target SWI/SNF to inactive chromatin, where it restructures the genomic landscape into an active state through epigenetic rewiring spanning multiple dimensions.

    View details for DOI 10.1038/s41594-022-00880-x

    View details for PubMedID 36522426

  • VISTA is a checkpoint regulator for naive T cell quiescence and peripheral tolerance SCIENCE ElTanbouly, M. A., Zhao, Y., Nowak, E., Li, J., Schaafsma, E., Le Mercier, I., Ceeraz, S., Lines, J., Peng, C., Carriere, C., Huang, X., Day, M., Koehn, B., Lee, S. W., Morales, M., Hogquist, K. A., Jameson, S. C., Mueller, D., Rothstein, J., Blazar, B. R., Cheng, C., Noelle, R. J. 2020; 367 (6475): 264-+


    Negative checkpoint regulators (NCRs) temper the T cell immune response to self-antigens and limit the development of autoimmunity. Unlike all other NCRs that are expressed on activated T lymphocytes, V-type immunoglobulin domain-containing suppressor of T cell activation (VISTA) is expressed on naïve T cells. We report an unexpected heterogeneity within the naïve T cell compartment in mice, where loss of VISTA disrupted the major quiescent naïve T cell subset and enhanced self-reactivity. Agonistic VISTA engagement increased T cell tolerance by promoting antigen-induced peripheral T cell deletion. Although a critical player in naïve T cell homeostasis, the ability of VISTA to restrain naïve T cell responses was lost under inflammatory conditions. VISTA is therefore a distinctive NCR of naïve T cells that is critical for steady-state maintenance of quiescence and peripheral tolerance.

    View details for DOI 10.1126/science.aay0524

    View details for Web of Science ID 000509802700030

    View details for PubMedID 31949051

    View details for PubMedCentralID PMC7391053

  • A Leukocyte Infiltration Score Defined by a Gene Signature Predicts Melanoma Patient Prognosis. Molecular cancer research : MCR Zhao, Y., Schaafsma, E., Gorlov, I. P., Hernando, E., Thomas, N. E., Shen, R., Turk, M. J., Berwick, M., Amos, C. I., Cheng, C. 2019; 17 (1): 109-119


    Melanoma is the most aggressive type of skin cancer in the United States with an increasing incidence. Melanoma lesions often exhibit high immunogenicity, with infiltrating immune cells playing important roles in regression of tumors occurring spontaneously or caused by therapeutic treatment. Computational and experimental methods have been used to estimate the abundance of immune cells in tumors, but their applications are limited by the requirement of large gene sets or multiple antibodies. Although the prognostic role of immune cells has been appreciated, a systematic investigation of their association with clinical factors, genomic features, prognosis and treatment response in melanoma is still lacking. This study, identifies a 25-gene signature based on RNA-seq data from The Cancer Genome Atlas (TCGA)-Skin Cutaneous Melanoma (TCGA-SKCM) dataset. This signature was used to calculate sample-specific Leukocyte Infiltration Scores (LIS) in six independent melanoma microarray datasets and scores were found to vary substantially between different melanoma lesion sites and molecular subtypes. For metastatic melanoma, LIS was prognostic in all datasets with high LIS being associated with good survival. The current approach provided additional prognostic information over established clinical factors, including age, tumor stage, and gender. In addition, LIS was predictive of patient survival in stage III melanoma, and treatment efficacy of tumor-specific antigen vaccine. IMPLICATIONS: This study identifies a 25-gene signature that effectively estimates the level of immune cell infiltration in melanoma, which provides a robust biomarker for predicting patient prognosis.

    View details for DOI 10.1158/1541-7786.MCR-18-0173

    View details for PubMedID 30171176

    View details for PubMedCentralID PMC6318018

  • A P53-Deficiency Gene Signature Predicts Recurrence Risk of Patients with Early-Stage Lung Adenocarcinoma. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology Zhao, Y., Varn, F. S., Cai, G., Xiao, F., Amos, C. I., Cheng, C. 2018; 27 (1): 86-95


    Background: Lung cancer is associated with the highest mortality rate of all cancer types, and the most common histologic subtype of lung cancer is adenocarcinoma. To apply more effective therapeutic treatment, molecular markers that are able to predict the recurrence risk of patients with adenocarcinoma are critically needed. Mutations in TP53 tumor suppressor gene have been found in approximately 50% of lung adenocarcinoma cases, but the presence of a TP53 mutation does not always associate with increased mortality.Methods: The Cancer Genome Atlas RNA sequencing data of lung adenocarcinoma were used to define a novel gene signature for P53 deficiency. This signature was then used to calculate a sample-specific P53 deficiency score based on a patient's transcriptomic profile and tested in four independent lung adenocarcinoma microarray datasets.Results: In all datasets, P53 deficiency score was a significant predictor for recurrence-free survival where high P53 deficiency score was associated with poor survival. The score was prognostic even after adjusting for several key clinical variables including age, tumor stage, smoking status, and P53 mutation status. Furthermore, the score was able to predict recurrence-free survival in patients with stage I adenocarcinoma and was also associated with smoking status.Conclusions: The P53 deficiency score was a better predictor of recurrence-free survival compared with P53 mutation status and provided additional prognostic values to established clinical factors.Impact: The P53 deficiency score can be used to stratify early-stage patients into subgroups based on their risk of recurrence for aiding physicians to decide personalized therapeutic treatment. Cancer Epidemiol Biomarkers Prev; 27(1); 86-95. ©2017 AACR.

    View details for DOI 10.1158/1055-9965.EPI-17-0478

    View details for PubMedID 29141854

    View details for PubMedCentralID PMC5839302

  • The steroid hormone estriol (E3) regulates epigenetic programming of fetal mouse brain and reproductive tract. BMC biology Zhou, Y., Gu, B., Brichant, G., Singh, J. P., Yang, H., Chang, H., Zhao, Y., Cheng, C., Liu, Z. W., Alderman, M. H., Lu, L., Yang, X., Gao, X. B., Taylor, H. S. 2022; 20 (1): 93


    Estriol (E3) is a steroid hormone formed only during pregnancy in primates including humans. Although E3 is synthesized at large amounts through a complex pathway involving the fetus and placenta, it is not required for the maintenance of pregnancy and has classically been considered virtually inactive due to associated very weak canonical estrogen signaling. However, estrogen exposure during pregnancy may have an effect on organs both within and outside the reproductive system, and compounds with binding affinity for estrogen receptors weaker than E3 have been found to impact reproductive organs and the brain. Here, we explore potential effects of E3 on fetal development using mouse as a model system.We administered E3 to pregnant mice, exposing the fetus to E3. Adult females exposed to E3 in utero (E3-mice) had increased fertility and superior pregnancy outcomes. Female and male E3-mice showed decreased anxiety and increased exploratory behavior. The expression levels and DNA methylation patterns of multiple genes in the uteri and brains of E3-mice were distinct from controls. E3 promoted complexing of estrogen receptors with several DNA/histone modifiers and their binding to target genes. E3 functions by driving epigenetic change, mediated through epigenetic modifier interactions with estrogen receptors rather than through canonical nuclear transcriptional activation.We identify an unexpected functional role for E3 in fetal reproductive system and brain. We further identify a novel mechanism of estrogen action, through recruitment of epigenetic modifiers to estrogen receptors and their target genes, which is not correlated with the traditional view of estrogen potency.

    View details for DOI 10.1186/s12915-022-01293-4

    View details for PubMedID 35491423

    View details for PubMedCentralID PMC9059368

  • Influence of T Cell-Mediated Immune Surveillance on Somatic Mutation Occurrences in Melanoma FRONTIERS IN IMMUNOLOGY Jiang, C., Schaafsma, E., Hong, W., Zhao, Y., Zhu, K., Chao, C., Cheng, C. 2022; 13: 703821


    Neoantigens are presented on the cancer cell surface by peptide-restricted human leukocyte antigen (HLA) proteins and can subsequently activate cognate T cells. It has been hypothesized that the observed somatic mutations in tumors are shaped by immunosurveillance.We investigated all somatic mutations identified in The Cancer Genome Atlas (TCGA) Skin Cutaneous Melanoma (SKCM) samples. By applying a computational algorithm, we calculated the binding affinity of the resulting neo-peptides and their corresponding wild-type peptides with the major histocompatibility complex (MHC) Class I complex. We then examined the relationship between binding affinity alterations and mutation frequency.Our results show that neoantigens derived from recurrent mutations tend to have lower binding affinities with the MHC Class I complex compared to peptides from non-recurrent mutations. Tumor samples harboring recurrent SKCM mutations exhibited lower immune infiltration levels, indicating a relatively colder immune microenvironment.These results suggested that the occurrences of somatic mutations in melanoma have been shaped by immunosurveillance. Mutations that lead to neoantigens with high MHC class I binding affinity are more likely to be eliminated and thus are less likely to be present in tumors.

    View details for DOI 10.3389/fimmu.2021.703821

    View details for Web of Science ID 000885020400001

    View details for PubMedID 35111147

    View details for PubMedCentralID PMC8801458

  • Computational modeling of chromatin accessibility identified important epigenomic regulators BMC GENOMICS Zhao, Y., Dong, Y., Hong, W., Jiang, C., Yao, K., Cheng, C. 2022; 23 (1): 19


    Chromatin accessibility is essential for transcriptional activation of genomic regions. It is well established that transcription factors (TFs) and histone modifications (HMs) play critical roles in chromatin accessibility regulation. However, there is a lack of studies that quantify these relationships. Here we constructed a two-layer model to predict chromatin accessibility by integrating DNA sequence, TF binding, and HM signals. By applying the model to two human cell lines (GM12878 and HepG2), we found that DNA sequences had limited power for accessibility prediction, while both TF binding and HM signals predicted chromatin accessibility with high accuracy. According to the HM model, HM features determined chromatin accessibility in a cell line shared manner, with the prediction power attributing to five core HM types. Results from the TF model indicated that chromatin accessibility was determined by a subset of informative TFs including both cell line-specific and generic TFs. The combined model of both TF and HM signals did not further improve the prediction accuracy, indicating that they provide redundant information in terms of chromatin accessibility prediction. The TFs and HM models can also distinguish the chromatin accessibility of proximal versus distal transcription start sites with high accuracy.

    View details for DOI 10.1186/s12864-021-08234-5

    View details for Web of Science ID 000740257600001

    View details for PubMedID 34996354

    View details for PubMedCentralID PMC8742372

  • Resident memory CD8(+) T cells in regional lymph nodes mediate immunity to metastatic melanoma IMMUNITY Molodtsov, A. K., Khatwani, N., Vella, J. L., Lewis, K. A., Zhao, Y., Han, J., Sullivan, D. E., Searles, T. G., Preiss, N. K., Shabaneh, T. B., Zhang, P., Hawkes, A. R., Malik, B. T., Kolling, F. W., Usherwood, E. J., Wong, S. L., Phillips, J. D., Shirai, K., Angeles, C., Yan, S., Curiel, T. J., Huang, Y. H., Cheng, C., Turk, M. 2021; 54 (9): 2117-+


    The nature of the anti-tumor immune response changes as primary tumors progress and metastasize. We investigated the role of resident memory (Trm) and circulating memory (Tcirm) cells in anti-tumor responses at metastatic locations using a mouse model of melanoma-associated vitiligo. We found that the transcriptional characteristics of tumor-specific CD8+ T cells were defined by the tissue of occupancy. Parabiosis revealed that tumor-specific Trm and Tcirm compartments persisted throughout visceral organs, but Trm cells dominated lymph nodes (LNs). Single-cell RNA-sequencing profiles of Trm cells in LN and skin were distinct, and T cell clonotypes that occupied both tissues were overwhelmingly maintained as Trm in LNs. Whereas Tcirm cells prevented melanoma growth in the lungs, Trm afforded long-lived protection against melanoma seeding in LNs. Expanded Trm populations were also present in melanoma-involved LNs from patients, and their transcriptional signature predicted better survival. Thus, tumor-specific Trm cells persist in LNs, restricting metastatic cancer.

    View details for DOI 10.1016/j.immuni.2021.08.019

    View details for Web of Science ID 000695789800018

    View details for PubMedID 34525340

    View details for PubMedCentralID PMC9015193

  • AutoEncoder-Based Computational Framework for Tumor Microenvironment Decomposition and Biomarker Identification in Metastatic Melanoma FRONTIERS IN GENETICS Zhao, Y., Dong, Y., Sun, Y., Cheng, C. 2021; 12: 665065


    Melanoma is one of the most aggressive cancer types whose prognosis is determined by both the tumor cell-intrinsic and -extrinsic features as well as their interactions. In this study, we performed systematic and unbiased analysis using The Cancer Genome Atlas (TCGA) melanoma RNA-seq data and identified two gene signatures that captured the intrinsic and extrinsic features, respectively. Specifically, we selected genes that best reflected the expression signals from tumor cells and immune infiltrate cells. Then, we applied an AutoEncoder-based method to decompose the expression of these genes into a small number of representative nodes. Many of these nodes were found to be significantly associated with patient prognosis. From them, we selected two most prognostic nodes and defined a tumor-intrinsic (TI) signature and a tumor-extrinsic (TE) signature. Pathway analysis confirmed that the TE signature recapitulated cytotoxic immune cell related pathways while the TI signature reflected MYC pathway activity. We leveraged these two signatures to investigate six independent melanoma microarray datasets and found that they were able to predict the prognosis of patients under standard care. Furthermore, we showed that the TE signature was also positively associated with patients' response to immunotherapies, including tumor vaccine therapy and checkpoint blockade immunotherapy. This study developed a novel computational framework to capture the tumor-intrinsic and -extrinsic features and identified robust prognostic and predictive biomarkers in melanoma.

    View details for DOI 10.3389/fgene.2021.665065

    View details for Web of Science ID 000659534100001

    View details for PubMedID 34122516

    View details for PubMedCentralID PMC8191580

  • Resident and circulating memory T cells persist for years in melanoma patients with durable responses to immunotherapy NATURE CANCER Han, J., Zhao, Y., Shirai, K., Molodtsov, A., Kolling, F. W., Fisher, J. L., Zhang, P., Yan, S., Searles, T. G., Bader, J. M., Gui, J., Cheng, C., Ernstoff, M. S., Turk, M., Angeles, C. V. 2021; 2 (3): 300-311


    While T-cell responses to cancer immunotherapy have been avidly studied, long-lived memory has been poorly characterized. In a cohort of metastatic melanoma survivors with exceptional responses to immunotherapy, we probed memory CD8+ T-cell responses across tissues, and across several years. Single-cell RNA sequencing revealed three subsets of resident memory T (TRM) cells shared between tumors and distant vitiligo-affected skin. Paired T-cell receptor sequencing further identified clonotypes in tumors that co-existed as TRM in skin and as effector memory T (TEM) cells in blood. Clonotypes that dispersed throughout tumor, skin, and blood preferentially expressed a IFNG / TNF-high signature, which had a strong prognostic value for melanoma patients. Remarkably, clonotypes from tumors were found in patient skin and blood up to nine years later, with skin maintaining the most focused tumor-associated clonal repertoire. These studies reveal that cancer survivors can maintain durable memory as functional, broadly-distributed TRM and TEM compartments.

    View details for DOI 10.1038/s43018-021-00180-1

    View details for Web of Science ID 000634948000007

    View details for PubMedID 34179824

    View details for PubMedCentralID PMC8223731

  • MYC Activity Inference Captures Diverse Mechanisms of Aberrant MYC Pathway Activation in Human Cancers. Molecular cancer research : MCR Schaafsma, E., Zhao, Y., Zhang, L., Li, Y., Cheng, C. 2021; 19 (3): 414-428


    c-MYC (MYC) is deregulated in more than 50% of all cancers. While MYC amplification is the most common MYC-deregulating event, many other alterations can increase MYC activity. We thus systematically investigated MYC pathway activity across different tumor types. Using a logistic regression framework, we established tumor type-specific, transcriptomic-based MYC activity scores that can accurately capture MYC activity. We show that MYC activity scores reflect a variety of MYC-regulating mechanisms, including MYCL and/or MYCN amplification, MYC promoter methylation, MYC mRNA expression, lncRNA PVT1 expression, MYC mutations, and viral integrations near the MYC locus. Our MYC activity score incorporates all of these mechanisms, resulting in better prognostic predictions compared with MYC amplification status, MYC promoter methylation, and MYC mRNA expression in several cancer types. In addition, we show that tumor proliferation and immune evasion are likely contributors to this reduction in survival. Finally, we developed a MYC activity signature for liquid tumors in which MYC translocation is commonly observed, suggesting that our approach can be applied to different types of genomic alterations. In conclusion, we developed a MYC activity score that captures MYC pathway activity and is clinically relevant. IMPLICATIONS: By using cancer type-specific MYC activity profiles, we were able to assess MYC activity across many more tumor types than previously investigated. The range of different MYC-related alterations captured by our MYC activity score can be used to facilitate the application of future MYC inhibitors and aid physicians to preselect patients for targeted therapy.

    View details for DOI 10.1158/1541-7786.MCR-20-0526

    View details for PubMedID 33234576

    View details for PubMedCentralID PMC7925347

  • VISTA: A Target to Manage the Innate Cytokine Storm FRONTIERS IN IMMUNOLOGY ElTanbouly, M. A., Zhao, Y., Schaafsma, E., Burns, C. M., Mabaera, R., Cheng, C., Noelle, R. J. 2021; 11: 595950


    In recent years, the success of immunotherapy targeting immunoregulatory receptors (immune checkpoints) in cancer have generated enthusiastic support to target these receptors in a wide range of other immune related diseases. While the overwhelming focus has been on blockade of these inhibitory pathways to augment immunity, agonistic triggering via these receptors offers the promise of dampening pathogenic inflammatory responses. V-domain Ig suppressor of T cell activation (VISTA) has emerged as an immunoregulatory receptor with constitutive expression on both the T cell and myeloid compartments, and whose agonistic targeting has proven a unique avenue relative to other checkpoint pathways to suppress pathologies mediated by the innate arm of the immune system. VISTA agonistic targeting profoundly changes the phenotype of human monocytes towards an anti-inflammatory cell state, as highlighted by striking suppression of the canonical markers CD14 and Fcγr3a (CD16), and the almost complete suppression of both the interferon I (IFN-I) and antigen presentation pathways. The insights from these very recent studies highlight the impact of VISTA agonistic targeting of myeloid cells, and its potential therapeutic implications in the settings of hyperinflammatory responses such as cytokine storms, driven by dysregulated immune responses to viral infections (with a focus on COVID-19) and autoimmune diseases. Collectively, these findings suggest that the VISTA pathway plays a conserved, non-redundant role in myeloid cell function.

    View details for DOI 10.3389/fimmu.2020.595950

    View details for Web of Science ID 000621350200001

    View details for PubMedID 33643285

    View details for PubMedCentralID PMC7905033

  • Gene signature-based prediction of triple-negative breast cancer patient response to Neoadjuvant chemotherapy CANCER MEDICINE Zhao, Y., Schaafsma, E., Cheng, C. 2020; 9 (17): 6281-6295


    Neoadjuvant chemotherapy is the current standard of care for large, advanced, and/or inoperable tumors, including triple-negative breast cancer. Although the clinical benefits of neoadjuvant chemotherapy have been illustrated through numerous clinical trials, more than half of the patients do not experience therapeutic benefit and needlessly suffer from side effects. Currently, no clinically applicable biomarkers are available for predicting neoadjuvant chemotherapy response in triple-negative breast cancer; the discovery of such a predictive biomarker or marker profile is an unmet need. In this study, we introduce a generic computational framework to calculate a response-probability score (RPS), based on patient transcriptomic profiles, to predict their response to neoadjuvant chemotherapy. We first validated this framework in ER-positive breast cancer patients and showed that it predicted neoadjuvant chemotherapy response with equal performance to several clinically used gene signatures, including Oncotype DX and MammaPrint. Then, we applied this framework to triple-negative breast cancer data and, for each patient, we calculated a response probability score (TNBC-RPS). Our results indicate that the TNBC-RPS achieved the highest accuracy for predicting neoadjuvant chemotherapy response compared to previously proposed 143 gene signatures. When combined with additional clinical factors, the TNBC-RPS achieved a high prediction accuracy for triple-negative breast cancer patients, which was comparable to the prediction accuracy of Oncotype DX and MammaPrint in ER-positive patients. In conclusion, the TNBC-RPS accurately predicts neoadjuvant chemotherapy response in triple-negative breast cancer patients and has the potential to be clinically used to aid physicians in stratifying patients for more effective neoadjuvant chemotherapy.

    View details for DOI 10.1002/cam4.3284

    View details for Web of Science ID 000550518900001

    View details for PubMedID 32692484

    View details for PubMedCentralID PMC7476842

  • An EGFR signature predicts cell line and patient sensitivity to multiple tyrosine kinase inhibitors INTERNATIONAL JOURNAL OF CANCER Cheng, C., Zhao, Y., Schaafsma, E., Weng, Y., Amos, C. 2020; 147 (9): 2621-2633


    EGFR is an oncogene with a high frequency of activating mutations in nonsmall cell lung cancer (NSCLC). EGFR inhibitors have been FDA-approved for NSCLC and have shown efficacy in patients with certain EGFR mutations. However, only 9% to 26% of these patients achieve objective responses. In our study, we developed an EGFR gene signature based on The Cancer Genome Atlas (TCGA) RNA-seq data of lung adenocarcinoma (LUAD) to direct the preselection of patients for more effective EGFR-targeted therapy. This signature infers baseline EGFR signaling pathway activity (denoted as EGFR score) in tumor samples, which is associated with tumor sensitivity to EGFR inhibitors and other tyrosine kinase inhibitors (TKIs). EGFR score predicted sensitivity of lung cancer cell lines to Erlotinib, Gefitinib and Sorafenib. Importantly, EGFR score calculated from pretreated samples was associated with patient response to Gefitinib and Sorafenib in lung cancer. Additionally, integration of the EGFR signature with TCGA LUAD data showed that it accurately predicted functional effects of different somatic EGFR mutations, and identified other mutations affecting EGFR pathway activity. Finally, using cancer cell line and clinical trial data, the EGFR score was associated with patient response to TKIs in liver cancer and other cancer types. The EGFR signature provides a useful biomarker that can expand the application of EGFR inhibitors or other TKIs and improve their treatment efficacy through patient stratification.

    View details for DOI 10.1002/ijc.33053

    View details for Web of Science ID 000537611600001

    View details for PubMedID 32406930

    View details for PubMedCentralID PMC7880578

  • Whole transcriptome signature for prognostic prediction (WTSPP): application of whole transcriptome signature for prognostic prediction in cancer LABORATORY INVESTIGATION Schaafsma, E., Zhao, Y., Wang, Y., Varn, F. S., Zhu, K., Yang, H., Cheng, C. 2020; 100 (10): 1356-1366


    Developing prognostic biomarkers for specific cancer types that accurately predict patient survival is increasingly important in clinical research and practice. Despite the enormous potential of prognostic signatures, proposed models have found limited implementations in routine clinical practice. Herein, we propose a generic, RNA sequencing platform independent, statistical framework named whole transcriptome signature for prognostic prediction to generate prognostic gene signatures. Using ovarian cancer and lung adenocarcinoma as examples, we provide evidence that our prognostic signatures overperform previous reported signatures, capture prognostic features not explained by clinical variables, and expose biologically relevant prognostic pathways, including those involved in the immune system and cell cycle. Our approach demonstrates a robust method for developing prognostic gene expression signatures. In conclusion, our statistical framework can be generally applied to all cancer types for prognostic prediction and might be extended to other human diseases. The proposed method is implemented as an R package (PanCancerSig) and is freely available on GitHub ( https://github.com/Cheng-Lab-GitHub/PanCancer_Signature ).

    View details for DOI 10.1038/s41374-020-0413-8

    View details for Web of Science ID 000518309600001

    View details for PubMedID 32144347

    View details for PubMedCentralID PMC7483260

  • Gene signatures associated with genomic aberrations predict prognosis in neuroblastoma CANCER COMMUNICATIONS He, X., Qin, C., Zhao, Y., Zou, L., Zhao, H., Cheng, C. 2020; 40 (2-3): 105-118


    Neuroblastoma (NB) is a heterogeneous disease with respect to genomic abnormalities and clinical behaviors. Despite recent advances in our understanding of the association between the genetic aberrations and clinical features, it remains one of the major challenges to predict prognosis and stratify patients for determining personalized therapy in this disease. The aim of this study was to develop an effective prognosis prediction model for NB patients.We integrated diverse computational analyses to define gene signatures that reflect MYCN activity and chromosomal aberrations including deletion of chromosome 1p (Chr1p_del) and chromosome 11q (Chr11q_del) as well as chromosome 11q whole loss (Chr11q_wls). We evaluated the prognostic and predictive values of these signatures in seven NB gene expression datasets (the number of samples ranges from 94 to 498, with a total of 2120) generated from both RNA sequencing and microarray platforms.MYCN signature was a more effective prognostic marker than MYCN amplification status and MYCN expression. Similarly, the Chr1p_del score was more prognostic than Chr1p status. The activity scores of MYCN, Chr1p_del and Chr11q_del were associated with poor prognosis, while the Chr11q_wls score was linked to good outcome. We integrated the activity scores of MYCN, Chr1p_del, Chr11q_del, and Chr11q_wls and clinical variables into an integrative prognostic model, which displayed significant performance over the clinical variables or each genomic aberration alone.Our integrative gene signature model shows a significantly improved forecast performance with prognostic and predictive information, and thereby can be served as a biomarker to stratify NB patients for prognosis evaluation and surveillance programs.

    View details for DOI 10.1002/cac2.12016

    View details for Web of Science ID 000523713300004

    View details for PubMedID 32237073

    View details for PubMedCentralID PMC7163660

  • Systematic computational identification of prognostic cytogenetic markers in neuroblastoma BMC MEDICAL GENOMICS Qin, C., He, X., Zhao, Y., Tong, C., Zhu, K. Y., Sun, Y., Cheng, C. 2019; 12 (1): 192


    Neuroblastoma (NB) is the most common extracranial solid tumor found in children. The frequent gain/loss of many chromosome bands in tumor cells and absence of mutations found at diagnosis suggests that NB is a copy number-driven cancer. Despite the previous work, a systematic analysis that investigates the relationship between such frequent gain/loss of chromosome bands and patient prognosis has yet to be implemented.First, we analyzed two NB CNV datasets to select chromosomal bands with a high frequency of gain or loss. Second, we applied a computational approach to infer sample-specific CNVs for each chromosomal band selected in step 1 based on gene expression data. Third, we applied univariate Cox proportional hazards models to examine the association between the resulting inferred copy number values (iCNVs) and patient survival. Finally, we applied multivariate Cox proportional hazards models to select chromosomal bands that remained significantly associated with prognosis after adjusting for critical clinical variables, including age, stage, gender, and MYCN amplification status.Here, we used a computational method to infer the copy number variations (CNVs) of sample-specific chromosome bands from NB patient gene expression profiles. The resulting inferred CNVs (iCNVs) were highly correlated with the experimentally determined CNVs, demonstrating CNVs can be accurately inferred from gene expression profiles. Using this iCNV metric, we identified 58 frequent gain/loss chromosome bands that were significantly associated with patient survival. Furthermore, we found that 7 chromosome bands were still significantly associated with patient survival even when clinical factors, such as MYCN status, were considered. Particularly, we found that the chromosome band chr11p14 has high potential as a novel candidate cytogenetic biomarker for clinical use.Our analysis resulted in a comprehensive list of prognostic chromosome bands supported by strong statistical evidence. In particular, the chr11p14 gain event provided additional prognostic value in addition to well-established clinical factors, including MYCN status, and thereby represents a novel candidate cytogenetic biomarker with high clinical potential. Additionally, this computational framework could be readily extended to other cancer types, such as leukemia.

    View details for DOI 10.1186/s12920-019-0620-6

    View details for Web of Science ID 000502830400001

    View details for PubMedID 31831008

    View details for PubMedCentralID PMC6909636

  • Computational STAT3 activity inference reveals its roles in the pancreatic tumor microenvironment SCIENTIFIC REPORTS Schaafsma, E., Yuan, Y., Zhao, Y., Cheng, C. 2019; 9: 18257


    Transcription factor (TF) STAT3 contributes to pancreatic cancer progression through its regulatory roles in both tumor cells and the tumor microenvironment (TME). In this study, we performed a systematic analysis of all TFs in patient-derived gene expression datasets and confirmed STAT3 as a critical regulator in the pancreatic TME. Importantly, we developed a novel framework that is based on TF target gene expression to distinguish between environmental- and tumor-specific STAT3 activities in gene expression studies. Using this framework, our results novelly showed that compartment-specific STAT3 activities, but not STAT3 mRNA, have prognostications towards clinical values within pancreatic cancer datasets. In addition, high TME-derived STAT3 activity correlates with an immunosuppressive TME in pancreatic cancer, characterized by CD4 T cell and monocyte infiltration and high copy number variation burden. Where environmental-STAT3 seemed to play a dominant role at primary pancreatic sites, tumor-specific STAT3 seemed dominant at metastatic sites where its high activity persisted. In conclusion, by combining compartment-specific inference with other tumor characteristics, including copy number variation and immune-related gene expression, we demonstrate our method's utility as a tool to generate novel hypotheses about TFs in tumor biology.

    View details for DOI 10.1038/s41598-019-54791-x

    View details for Web of Science ID 000501315400001

    View details for PubMedID 31796877

    View details for PubMedCentralID PMC6890662

  • Single-cell RNA sequencing reveals the impact of chromosomal instability on glioblastoma cancer stem cells. BMC medical genomics Zhao, Y. n., Carter, R. n., Natarajan, S. n., Varn, F. S., Compton, D. A., Gawad, C. n., Cheng, C. n., Godek, K. M. 2019; 12 (1): 79


    Intra-tumor heterogeneity stems from genetic, epigenetic, functional, and environmental differences among tumor cells. A major source of genetic heterogeneity comes from DNA sequence differences and/or whole chromosome and focal copy number variations (CNVs). Whole chromosome CNVs are caused by chromosomal instability (CIN) that is defined by a persistently high rate of chromosome mis-segregation. Accordingly, CIN causes constantly changing karyotypes that result in extensive cell-to-cell genetic heterogeneity. How the genetic heterogeneity caused by CIN influences gene expression in individual cells remains unknown.We performed single-cell RNA sequencing on a chromosomally unstable glioblastoma cancer stem cell (CSC) line and a control normal, diploid neural stem cell (NSC) line to investigate the impact of CNV due to CIN on gene expression. From the gene expression data, we computationally inferred large-scale CNVs in single cells. Also, we performed copy number adjusted differential gene expression analysis between NSCs and glioblastoma CSCs to identify copy number dependent and independent differentially expressed genes.Here, we demonstrate that gene expression across large genomic regions scales proportionally to whole chromosome copy number in chromosomally unstable CSCs. Also, we show that the differential expression of most genes between normal NSCs and glioblastoma CSCs is largely accounted for by copy number alterations. However, we identify 269 genes whose differential expression in glioblastoma CSCs relative to normal NSCs is independent of copy number. Moreover, a gene signature derived from the subset of genes that are differential expressed independent of copy number in glioblastoma CSCs correlates with tumor grade and is prognostic for patient survival.These results demonstrate that CIN is directly responsible for gene expression changes and contributes to both genetic and transcriptional heterogeneity among glioblastoma CSCs. These results also demonstrate that the expression of some genes is buffered against changes in copy number, thus preserving some consistency in gene expression levels from cell-to-cell despite the continuous change in karyotype driven by CIN. Importantly, a gene signature derived from the subset of genes whose expression is buffered against copy number alterations correlates with tumor grade and is prognostic for patient survival that could facilitate patient diagnosis and treatment.

    View details for DOI 10.1186/s12920-019-0532-5

    View details for PubMedID 31151460

    View details for PubMedCentralID PMC6545015

  • Applications of ENCODE data to Systematic Analyses via Data Integration. Current opinion in systems biology Zhao, Y., Schaafsma, E., Cheng, C. 2018; 11: 57-64


    Large-scale genomic data have been utilized to generate unprecedented biological findings and new hypotheses. To delineate functional elements in the human genome, the Encyclopedia of DNA Elements (ENCODE) project has generated an enormous amount of genomic data, yielding around 7,000 data profiles in different cell and tissue types. In this article, we reviewed the systematic analyses that have integrated ENCODE data with other data sources to reveal new biological insights, ranging from human genome annotation to the identification of new candidate drugs. These analyses demonstrate the critical impact of ENCODE data on basic biology and translational research.

    View details for DOI 10.1016/j.coisb.2018.08.010

    View details for PubMedID 31011690

    View details for PubMedCentralID PMC6474416

  • Comparative Analysis of Mutant Huntingtin Binding Partners in Yeast Species SCIENTIFIC REPORTS Zhao, Y., Zurawel, A. A., Jenkins, N. P., Duennwald, M. L., Cheng, C., Kettenbach, A. N., Supattapone, S. 2018; 8: 9554


    Huntington's disease is caused by the pathological expansion of a polyglutamine (polyQ) stretch in Huntingtin (Htt), but the molecular mechanisms by which polyQ expansion in Htt causes toxicity in selective neuronal populations remain poorly understood. Interestingly, heterologous expression of expanded polyQ Htt is toxic in Saccharomyces cerevisiae cells, but has no effect in Schizosaccharomyces pombe, a related yeast species possessing very few endogenous polyQ or Q/N-rich proteins. Here, we used a comprehensive and unbiased mass spectrometric approach to identify proteins that bind Htt in a length-dependent manner in both species. Analysis of the expanded polyQ-associated proteins reveals marked enrichment of proteins that are localized to and play functional roles in nucleoli and mitochondria in S. cerevisiae, but not in S. pombe. Moreover, expanded polyQ Htt appears to interact preferentially with endogenous polyQ and Q/N-rich proteins, which are rare in S. pombe, as well as proteins containing coiled-coil motifs in S. cerevisiae. Taken together, these results suggest that polyQ expansion of Htt may cause cellular toxicity in S. cerevisiae by sequestering endogenous polyQ and Q/N-rich proteins, particularly within nucleoli and mitochondria.

    View details for DOI 10.1038/s41598-018-27900-5

    View details for Web of Science ID 000436046500068

    View details for PubMedID 29934597

    View details for PubMedCentralID PMC6015068

  • Replication origin-flanking roadblocks reveal origin-licensing dynamics and altered sequence dependence JOURNAL OF BIOLOGICAL CHEMISTRY Warner, M. D., Azmi, I. F., Kang, S., Zhao, Y., Bell, S. P. 2017; 292 (52): 21417-21430


    In eukaryotes, DNA replication initiates from multiple origins of replication for timely genome duplication. These sites are selected by origin licensing, during which the core enzyme of the eukaryotic DNA replicative helicase, the Mcm2-7 (minichromosome maintenance) complex, is loaded at each origin. This origin licensing requires loading two Mcm2-7 helicases around origin DNA in a head-to-head orientation. Current models suggest that the origin-recognition complex (ORC) and cell-division cycle 6 (Cdc6) proteins recognize and encircle origin DNA and assemble an Mcm2-7 double-hexamer around adjacent double-stranded DNA. To test this model and assess the location of Mcm2-7 initial loading, we placed DNA-protein roadblocks at defined positions adjacent to the essential ORC-binding site within Saccharomyces cerevisiae origin DNA. Roadblocks were made either by covalent cross-linking of the HpaII methyltransferase to DNA or through binding of a transcription activator-like effector (TALE) protein. Contrary to the sites of Mcm2-7 recruitment being precisely defined, only single roadblocks that inhibited ORC-DNA binding showed helicase loading defects. We observed inhibition of helicase loading without inhibition of ORC-DNA binding only when roadblocks were placed on both sides of the origin to restrict sliding of a helicase-loading intermediate. Consistent with a sliding helicase-loading intermediate, when either one of the flanking roadblocks was eliminated, the remaining roadblock had no effect on helicase loading. Interestingly, either origin-flanking nucleosomes or roadblocks resulted in helicase loading being dependent on an additional origin sequence known to be a weaker ORC-DNA-binding site. Together, our findings support a model in which sliding helicase-loading intermediates increase the flexibility of the DNA sequence requirements for origin licensing.

    View details for DOI 10.1074/jbc.M117.815639

    View details for Web of Science ID 000419013000018

    View details for PubMedID 29074622

    View details for PubMedCentralID PMC5766963