Institute Affiliations

All Publications

  • Identification and transcriptomic assessment of latent profile pediatric septic shock phenotypes. Critical care (London, England) Atreya, M. R., Huang, M., Moore, A. R., Zheng, H., Hasin-Brumshtein, Y., Fitzgerald, J. C., Weiss, S. L., Cvijanovich, N. Z., Bigham, M. T., Jain, P. N., Schwarz, A. J., Lutfi, R., Nowak, J., Thomas, N. J., Quasney, M., Dahmer, M. K., Baines, T., Haileselassie, B., Lautz, A. J., Stanski, N. L., Standage, S. W., Kaplan, J. M., Zingarelli, B., Sahay, R., Zhang, B., Sweeney, T. E., Khatri, P., Sanchez-Pinto, L. N., Kamaleswaran, R. 2024; 28 (1): 246


    Sepsis poses a grave threat, especially among children, but treatments are limited owing to heterogeneity among patients. We sought to test the clinical and biological relevance of pediatric septic shock subclasses identified using reproducible approaches.We performed latent profile analyses using clinical, laboratory, and biomarker data from a prospective multi-center pediatric septic shock observational cohort to derive phenotypes and trained a support vector machine model to assign phenotypes in an internal validation set. We established the clinical relevance of phenotypes and tested for their interaction with common sepsis treatments on patient outcomes. We conducted transcriptomic analyses to delineate phenotype-specific biology and inferred underlying cell subpopulations. Finally, we compared whether latent profile phenotypes overlapped with established gene-expression endotypes and compared survival among patients based on an integrated subclassification scheme.Among 1071 pediatric septic shock patients requiring vasoactive support on day 1 included, we identified two phenotypes which we designated as Phenotype 1 (19.5%) and Phenotype 2 (80.5%). Membership in Phenotype 1 was associated with ~ fourfold adjusted odds of complicated course relative to Phenotype 2. Patients belonging to Phenotype 1 were characterized by relatively higher Angiopoietin-2/Tie-2 ratio, Angiopoietin-2, soluble thrombomodulin (sTM), interleukin 8 (IL-8), and intercellular adhesion molecule 1 (ICAM-1) and lower Tie-2 and Angiopoietin-1 concentrations compared to Phenotype 2. We did not identify significant interactions between phenotypes, common treatments, and clinical outcomes. Transcriptomic analysis revealed overexpression of genes implicated in the innate immune response and driven primarily by developing neutrophils among patients designated as Phenotype 1. There was no statistically significant overlap between established gene-expression endotypes, reflective of the host adaptive response, and the newly derived phenotypes, reflective of the host innate response including microvascular endothelial dysfunction. However, an integrated subclassification scheme demonstrated varying survival probabilities when comparing patient endophenotypes.Our research underscores the reproducibility of latent profile analyses to identify pediatric septic shock phenotypes with high prognostic relevance. Pending validation, an integrated subclassification scheme, reflective of the different facets of the host response, holds promise to inform targeted intervention among those critically ill.

    View details for DOI 10.1186/s13054-024-05020-z

    View details for PubMedID 39014377

    View details for PubMedCentralID 6970225

  • Impaired innate and adaptive immune responses to BNT162b2 SARS-CoV-2 vaccination in systemic lupus erythematosus. JCI insight Sarin, K. Y., Zheng, H., Chaichian, Y., Arunachalam, P. S., Swaminathan, G., Eschholz, A., Gao, F., Wirz, O. F., Lam, B., Yang, E., Lee, L. W., Feng, A., Lewis, M. A., Lin, J., Maecker, H. T., Boyd, S. D., Davis, M. M., Nadeau, K. C., Pulendran, B., Khatri, P., Utz, P. J., Zaba, L. C. 2024; 9 (5)


    Understanding the immune responses to SARS-CoV-2 vaccination is critical to optimizing vaccination strategies for individuals with autoimmune diseases, such as systemic lupus erythematosus (SLE). Here, we comprehensively analyzed innate and adaptive immune responses in 19 patients with SLE receiving a complete 2-dose Pfizer-BioNTech mRNA vaccine (BNT162b2) regimen compared with a control cohort of 56 healthy control (HC) volunteers. Patients with SLE exhibited impaired neutralizing antibody production and antigen-specific CD4+ and CD8+ T cell responses relative to HC. Interestingly, antibody responses were only altered in patients with SLE treated with immunosuppressive therapies, whereas impairment of antigen-specific CD4+ and CD8+ T cell numbers was independent of medication. Patients with SLE also displayed reduced levels of circulating CXC motif chemokine ligands, CXCL9, CXCL10, CXCL11, and IFN-γ after secondary vaccination as well as downregulation of gene expression pathways indicative of compromised innate immune responses. Single-cell RNA-Seq analysis reveals that patients with SLE showed reduced levels of a vaccine-inducible monocyte population characterized by overexpression of IFN-response transcription factors. Thus, although 2 doses of BNT162b2 induced relatively robust immune responses in patients with SLE, our data demonstrate impairment of both innate and adaptive immune responses relative to HC, highlighting a need for population-specific vaccination studies.

    View details for DOI 10.1172/jci.insight.176556

    View details for PubMedID 38456511

  • Integrative systems biology reveals NKG2A-biased immune responses correlate with protection in infectious disease, autoimmune disease, and cancer. Cell reports Chen, D. G., Xie, J., Choi, J., Ng, R. H., Zhang, R., Li, S., Edmark, R., Zheng, H., Solomon, B., Campbell, K. M., Medina, E., Ribas, A., Khatri, P., Lanier, L. L., Mease, P. J., Goldman, J. D., Su, Y., Heath, J. R. 2024; 43 (3): 113872


    Infection, autoimmunity, and cancer are principal human health challenges of the 21st century. Often regarded as distinct ends of the immunological spectrum, recent studies hint at potential overlap between these diseases. For example, inflammation can be pathogenic in infection and autoimmunity. T resident memory (TRM) cells can be beneficial in infection and cancer. However, these findings are limited by size and scope; exact immunological factors shared across diseases remain elusive. Here, we integrate large-scale deeply clinically and biologically phenotyped human cohorts of 526 patients with infection, 162 with lupus, and 11,180 with cancer. We identify an NKG2A+ immune bias as associative with protection against disease severity, mortality, and autoimmune/post-acute chronic disease. We reveal that NKG2A+ CD8+ T cells correlate with reduced inflammation and increased humoral immunity and that they resemble TRM cells. Our results suggest NKG2A+ biases as a cross-disease factor of protection, supporting suggestions of immunological overlap between infection, autoimmunity, and cancer.

    View details for DOI 10.1016/j.celrep.2024.113872

    View details for PubMedID 38427562

  • Systems immunology of transcriptional responses to viral infection identifies conserved antiviral pathways across macaques and humans. Cell reports Ratnasiri, K., Zheng, H., Toh, J., Yao, Z., Duran, V., Donato, M., Roederer, M., Kamath, M., Todd, J. M., Gagne, M., Foulds, K. E., Francica, J. R., Corbett, K. S., Douek, D. C., Seder, R. A., Einav, S., Blish, C. A., Khatri, P. 2024; 43 (2): 113706


    Viral pandemics and epidemics pose a significant global threat. While macaque models of viral disease are routinely used, it remains unclear how conserved antiviral responses are between macaques and humans. Therefore, we conducted a cross-species analysis of transcriptomic data from over 6,088 blood samples from macaques and humans infected with one of 31 viruses. Our findings demonstrate that irrespective of primate or viral species, there are conserved antiviral responses that are consistent across infection phase (acute, chronic, or latent) and viral genome type (DNA or RNA viruses). Leveraging longitudinal data from experimental challenges, we identify virus-specific response kinetics such as host responses to Coronaviridae and Orthomyxoviridae infections peaking 1-3 days earlier than responses to Filoviridae and Arenaviridae viral infections. Our results underscore macaque studies as a powerful tool for understanding viral pathogenesis and immune responses that translate to humans, with implications for viral therapeutic development and pandemic preparedness.

    View details for DOI 10.1016/j.celrep.2024.113706

    View details for PubMedID 38294906

  • An NKG2A biased immune response confers protection for infection, autoimmune disease, and cancer. Research square Heath, J., Chen, D., Xie, J., Choi, J., Ng, R., Zhang, R., Li, S., Edmark, R., Zheng, H., Solomon, B., Campbell, K., Medina, E., Ribas, A., Khatri, P., Lanier, L., Mease, P., Goldman, J., Su, Y. 2023


    Infection, autoimmunity, and cancer are the principal human health challenges of the 21st century and major contributors to human death and disease. Often regarded as distinct ends of the immunological spectrum, recent studies have hinted there may be more overlap between these diseases than appears. For example, pathogenic inflammation has been demonstrated as conserved between infection and autoimmune settings. T resident memory (TRM) cells have been highlighted as beneficial for infection and cancer. However, these findings are limited by patient number and disease scope; exact immunological factors shared across disease remain elusive. Here, we integrate large-scale deeply clinically and biologically phenotyped human cohorts of 526 patients with infection, 162 with lupus, and 11,180 with cancer. We identify an NKG2A+ immune bias as associative with protection against disease severity, mortality, and autoimmune and post-acute chronic disease. We reveal that NKG2A+ CD8+ T cells correlate with reduced inflammation, increased humoral immunity, and resemble TRM cells. Our results suggest that an NKG2A+ bias is a pan-disease immunological factor of protection and thus supports recent suggestions that there is immunological overlap between infection, autoimmunity, and cancer. Our findings underscore the promotion of an NKG2A+ biased response as a putative therapeutic strategy.

    View details for DOI 10.21203/

    View details for PubMedID 37886475

  • Multi-omics analysis of mucosal and systemic immunity to SARS-CoV-2 after birth. Cell Wimmers, F., Burrell, A. R., Feng, Y., Zheng, H., Arunachalam, P. S., Hu, M., Spranger, S., Nyhoff, L. E., Joshi, D., Trisal, M., Awasthi, M., Bellusci, L., Ashraf, U., Kowli, S., Konvinse, K. C., Yang, E., Blanco, M., Pellegrini, K., Tharp, G., Hagan, T., Chinthrajah, R. S., Nguyen, T. T., Grifoni, A., Sette, A., Nadeau, K. C., Haslam, D. B., Bosinger, S. E., Wrammert, J., Maecker, H. T., Utz, P. J., Wang, T. T., Khurana, S., Khatri, P., Staat, M. A., Pulendran, B. 2023


    The dynamics of immunity to infection in infants remain obscure. Here, we used a multi-omics approach to perform a longitudinal analysis of immunity to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in infants and young children by analyzing blood samples and weekly nasal swabs collected before, during, and after infection with Omicron and non-Omicron variants. Infection stimulated robust antibody titers that, unlike in adults, showed no sign of decay for up to 300 days. Infants mounted a robust mucosal immune response characterized by inflammatory cytokines, interferon (IFN) α, and T helper (Th) 17 and neutrophil markers (interleukin [IL]-17, IL-8, and CXCL1). The immune response in blood was characterized by upregulation of activation markers on innate cells, no inflammatory cytokines, but several chemokines and IFNα. The latter correlated with viral load and expression of interferon-stimulated genes (ISGs) in myeloid cells measured by single-cell multi-omics. Together, these data provide a snapshot of immunity to infection during the initial weeks and months of life.

    View details for DOI 10.1016/j.cell.2023.08.044

    View details for PubMedID 37776858

  • Prediction of HLA genotypes from single-cell transcriptome data. Frontiers in immunology Solomon, B. D., Zheng, H., Dillon, L. W., Goldman, J. D., Hourigan, C. S., Heath, J. R., Khatri, P. 2023; 14: 1146826


    The human leukocyte antigen (HLA) locus plays a central role in adaptive immune function and has significant clinical implications for tissue transplant compatibility and allelic disease associations. Studies using bulk-cell RNA sequencing have demonstrated that HLA transcription may be regulated in an allele-specific manner and single-cell RNA sequencing (scRNA-seq) has the potential to better characterize these expression patterns. However, quantification of allele-specific expression (ASE) for HLA loci requires sample-specific reference genotyping due to extensive polymorphism. While genotype prediction from bulk RNA sequencing is well described, the feasibility of predicting HLA genotypes directly from single-cell data is unknown. Here we evaluate and expand upon several computational HLA genotyping tools by comparing predictions from human single-cell data to gold-standard, molecular genotyping. The highest 2-field accuracy averaged across all loci was 76% by arcasHLA and increased to 86% using a composite model of multiple genotyping tools. We also developed a highly accurate model (AUC 0.93) for predicting HLA-DRB345 copy number in order to improve genotyping accuracy of the HLA-DRB locus. Genotyping accuracy improved with read depth and was reproducible at repeat sampling. Using a metanalytic approach, we also show that HLA genotypes from PHLAT and OptiType can generate ASE ratios that are highly correlated (R2 = 0.8 and 0.94, respectively) with those derived from gold-standard genotyping.

    View details for DOI 10.3389/fimmu.2023.1146826

    View details for PubMedID 37180102

  • Serum proteome analysis of systemic JIA and related lung disease identifies distinct inflammatory programs and biomarkers. Arthritis & rheumatology (Hoboken, N.J.) Chen, G., Deutsch, G. H., Schulert, G., Zheng, H., Jang, S., Trapnell, B., Lee, P., Macaubas, C., Ho, K., Schneider, C., Saper, V. E., de Jesus, A. A., Krasnow, M., Grom, A., Goldbach-Mansky, R., Khatri, P., Mellins, E. D., Canna, S. W. 2022


    OBJECTIVES: Recent observations in systemic Juvenile Idiopathic Arthritis (sJIA) suggest an increasing incidence of high-mortality interstitial lung disease (sJIA-LD) often characterized by a variant of pulmonary alveolar proteinosis (PAP). Co-occurrence of macrophage activation syndrome (MAS) and PAP in sJIA suggested a shared pathology, but sJIA-LD patients also commonly experience features of drug reaction such as atypical rashes and eosinophilia. We sought to investigate immunopathology and identify biomarkers in sJIA, MAS, and sJIA-LD.METHODS: We used SOMAscan to measure >1300 analytes in sera from healthy controls and patients with sJIA, MAS, sJIA-LD and other related diseases. We verified selected findings by ELISA and lung immunostaining. Because the proteome of a sample may reflect multiple states (sJIA, MAS, sJIA-LD), we used regression modeling to identify subsets of altered proteins associated with each state. We tested key findings in a validation cohort.RESULTS: Proteome alterations in active sJIA and MAS overlapped substantially, including known sJIA biomarkers like SAA and S100A9, and novel elevations of heat shock proteins and glycolytic enzymes. IL-18 was elevated in all sJIA groups, particularly MAS and sJIA-LD. We also identified an MAS-independent sJIA-LD signature notable for elevated ICAM5, MMP7, and allergic/eosinophilic chemokines, which have been previously associated with lung damage. Immunohistochemistry localized ICAM5 and MMP7 in sJIA-LD lung. ICAM5's ability to distinguish sJIA-LD from sJIA/MAS was independently validated.CONCLUSION: Serum proteins support an sJIA-to-MAS continuum, help distinguish sJIA, sJIA/MAS, and sJIA-LD and suggest etiologic hypotheses. Select biomarkers, such as ICAM5, could aid in early detection and management of sJIA-LD.

    View details for DOI 10.1002/art.42099

    View details for PubMedID 35189047

  • A 6-mRNA host response classifier in whole blood predicts outcomes in COVID-19 and other acute viral infections. Scientific reports Buturovic, L., Zheng, H., Tang, B., Lai, K., Kuan, W. S., Gillett, M., Santram, R., Shojaei, M., Almansa, R., Nieto, J. A., Munoz, S., Herrero, C., Antonakos, N., Koufargyris, P., Kontogiorgi, M., Damoraki, G., Liesenfeld, O., Wacker, J., Midic, U., Luethy, R., Rawling, D., Remmel, M., Coyle, S., Liu, Y. E., Rao, A. M., Dermadi, D., Toh, J., Jones, L. M., Donato, M., Khatri, P., Giamarellos-Bourboulis, E. J., Sweeney, T. E. 1800; 12 (1): 889


    Predicting the severity of COVID-19 remains an unmet medical need. Our objective was to develop a blood-based host-gene-expression classifier for the severity of viral infections and validate it in independent data, including COVID-19. We developed a logistic regression-based classifier for the severity of viral infections and validated it in multiple viral infection settings including COVID-19. We used training data (N=705) from 21 retrospective transcriptomic clinical studies of influenza and other viral illnesses looking at a preselected panel of host immune response messenger RNAs. We selected 6 host RNAs and trained logistic regression classifier with a cross-validation area under curve of 0.90 for predicting 30-day mortality in viral illnesses. Next, in 1417 samples across 21 independent retrospective cohorts the locked 6-RNA classifier had an area under curve of 0.94 for discriminating patients with severe vs. non-severe infection. Next, in independent cohorts of prospectively (N=97) and retrospectively (N=100) enrolled patients with confirmed COVID-19, the classifier had an area under curve of 0.89 and 0.87, respectively, for identifying patients with severe respiratory failure or 30-day mortality. Finally, we developed a loop-mediated isothermal gene expression assay for the 6-messenger-RNA panel to facilitate implementation as a rapid assay. With further study, the classifier could assist in the risk assessment of COVID-19 and other acute viral infections patients to determine severity and level of care, thereby improving patient management and reducing healthcare burden.

    View details for DOI 10.1038/s41598-021-04509-9

    View details for PubMedID 35042868

  • NSD1 mutations deregulate transcription and DNA methylation of bivalent developmental genes in Sotos syndrome. Human molecular genetics Brennan, K., Zheng, H., Fahrner, J. A., Shin, J. H., Gentles, A. J., Schaefer, B., Sunwoo, J. B., Bernstein, J. A., Gevaert, O. 2022


    Sotos syndrome (SS), the most common overgrowth with intellectual disability (OGID) disorder, is caused by inactivating germline mutations of NSD1, which encodes a histone H3 lysine 36 methyltransferase. To understand how NSD1 inactivation deregulates transcription and DNA methylation (DNAm), and to explore how these abnormalities affect human development, we profiled transcription and DNAm in SS patients and healthy control individuals. We identified a transcriptional signature that distinguishes individuals with SS from controls and was also deregulated in NSD1 mutated cancers. Most abnormally expressed genes displayed reduced expression in SS; these downregulated genes consisted mostly of bivalent genes and were enriched for regulators of development and neural synapse function. DNA hypomethylation was strongly enriched within promoters of transcriptionally deregulated genes: Overexpressed genes displayed hypomethylation at their transcription start sites (TSSs) while underexpressed genes featured hypomethylation at polycomb binding sites within their promoter CpG island shores. SS patients featured accelerated molecular aging at the levels of both transcription and DNAm. Overall, these findings indicate that NSD1-deposited H3K36 methylation regulates transcription by directing promoter DNA methylation, partially by repressing polycomb repressive complex 2 (PRC2) activity. These findings could explain the phenotypic similarity of SS to OGID disorders that are caused by mutations in PRC2 complex-encoding genes.

    View details for DOI 10.1093/hmg/ddac026

    View details for PubMedID 35094088

  • Tumor response as defined by iRECIST in gastrointestinal malignancies treated with PD-1 and PD-L1 inhibitors and correlation with survival. BMC cancer Xie, P., Zheng, H., Chen, H., Wei, K., Pan, X., Xu, Q., Wang, Y., Tang, C., Gevaert, O., Meng, X. 2021; 21 (1): 1246


    BACKGROUND: Atypical tumor response patterns during immune checkpoint inhibitor therapy pose a challenge to clinicians and investigators in immuno-oncology practice. This study evaluated tumor burden dynamics to identify imaging biomarkers for treatment response and overall survival (OS) in advanced gastrointestinal malignancies treated with PD-1/PD-L1 inhibitors.METHODS: This retrospective study enrolled a total of 198 target lesions in 75 patients with advanced gastrointestinal malignancies treated with PD-1/PD-L1 inhibitors between January 2017 and March 2021. Tumor diameter changes as defined by immunotherapy Response Evaluation Criteria in Solid Tumors (iRECIST) were studied to determine treatment response and association with OS.RESULTS: Based on the best overall response, the tumor diameter ranged from -100 to +135.3% (median: -9.6%). The overall response rate was 32.0% (24/75), and the rate of durable disease control for at least 6months was 30.7% (23/75, one (iCR, immune complete response) or 20 iPR (immune partial response), or 2iSD (immune stable disease). Using univariate analysis, patients with a tumor diameter maintaining a<20% increase (48/75, 64.0%) from baseline had longer OS than those with ≥20% increase (27/75, 36.0%) and, a reduced risk of death (median OS: 80months vs. 48months, HR=0.22, P=0.034). The differences in age (HR=1.09, P=0.01), combined surgery (HR=0.15, P=0.01) and cancer type (HR=0.23, P=0.001) were significant. In multivariable analysis, patients with a tumor diameter with a<20% increase had notably reduced hazards of death (HR=0.15, P=0.01) after adjusting for age, combined surgery, KRAS status, cancer type, mismatch repair (MMR) status, treatment course and cancer differentiation. Two patients (2.7%) showed pseudoprogression.CONCLUSIONS: Tumor diameter with a<20% increase from baseline during therapy in gastrointestinal malignancies was associated with therapeutic benefit and longer OS and may serve as a practical imaging marker for treatment response, clinical outcome and treatment decision making.

    View details for DOI 10.1186/s12885-021-08944-9

    View details for PubMedID 34798858

  • Multi-cohort analysis of host immune response identifies conserved protective and detrimental modules associated with severity across viruses. Immunity Zheng, H., Rao, A. M., Dermadi, D., Toh, J., Murphy Jones, L., Donato, M., Liu, Y., Su, Y., Dai, C. L., Kornilov, S. A., Karagiannis, M., Marantos, T., Hasin-Brumshtein, Y., He, Y. D., Giamarellos-Bourboulis, E. J., Heath, J. R., Khatri, P. 2021


    Viral infections induce a conserved host response distinct from bacterial infections. We hypothesized that the conserved response is associated with disease severity and is distinct between patients with different outcomes. To test this, we integrated 4,780 blood transcriptome profiles from patients aged 0 to 90 years infected with one of 16 viruses, including SARS-CoV-2, Ebola, chikungunya, and influenza, across 34 cohorts from 18 countries, and single-cell RNA sequencing profiles of 702,970 immune cells from 289 samples across three cohorts. Severe viral infection was associated with increased hematopoiesis, myelopoiesis, and myeloid-derived suppressor cells. We identified protective and detrimental gene modules that defined distinct trajectories associated with mild versus severe outcomes. The interferon response was decoupled from the protective host response in patients with severe outcomes. These findings were consistent, irrespective of age and virus, and provide insights to accelerate the development of diagnostics and host-directed therapies to improve global pandemic preparedness.

    View details for DOI 10.1016/j.immuni.2021.03.002

    View details for PubMedID 33765435

  • Artificial intelligence and data science applied to bioengineering AIMS BIOENGINEERING Espin-Perez, A., Bozkurt, S., Zheng, H., Nivina, A. 2021; 8 (1): 93–94
  • A meta-learning approach for genomic survival analysis. Nature communications Qiu, Y. L., Zheng, H. n., Devos, A. n., Selby, H. n., Gevaert, O. n. 2020; 11 (1): 6350


    RNA sequencing has emerged as a promising approach in cancer prognosis as sequencing data becomes more easily and affordably accessible. However, it remains challenging to build good predictive models especially when the sample size is limited and the number of features is high, which is a common situation in biomedical settings. To address these limitations, we propose a meta-learning framework based on neural networks for survival analysis and evaluate it in a genomic cancer research setting. We demonstrate that, compared to regular transfer-learning, meta-learning is a significantly more effective paradigm to leverage high-dimensional data that is relevant but not directly related to the problem of interest. Specifically, meta-learning explicitly constructs a model, from abundant data of relevant tasks, to learn a new task with few samples effectively. For the application of predicting cancer survival outcome, we also show that the meta-learning framework with a few samples is able to achieve competitive performance with learning from scratch with a significantly larger number of samples. Finally, we demonstrate that the meta-learning model implicitly prioritizes genes based on their contribution to survival prediction and allows us to identify important pathways in cancer.

    View details for DOI 10.1038/s41467-020-20167-3

    View details for PubMedID 33311484

  • Genomic data imputation with variational auto-encoders. GigaScience Qiu, Y. L., Zheng, H. n., Gevaert, O. n. 2020; 9 (8)


    As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random.In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder.We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.

    View details for DOI 10.1093/gigascience/giaa082

    View details for PubMedID 32761097

  • Whole slide images reflect DNA methylation patterns of human tumors. NPJ genomic medicine Zheng, H. n., Momeni, A. n., Cedoz, P. L., Vogel, H. n., Gevaert, O. n. 2020; 5 (1): 11


    DNA methylation is an important epigenetic mechanism regulating gene expression and its role in carcinogenesis has been extensively studied. High-throughput DNA methylation assays have been used broadly in cancer research. Histopathology images are commonly obtained in cancer treatment, given that tissue sampling remains the clinical gold-standard for diagnosis. In this work, we investigate the interaction between cancer histopathology images and DNA methylation profiles to provide a better understanding of tumor pathobiology at the epigenetic level. We demonstrate that classical machine learning algorithms can associate the DNA methylation profiles of cancer samples with morphometric features extracted from whole slide images. Furthermore, grouping the genes into methylation clusters greatly improves the performance of the models. The well-predicted genes are enriched in key pathways in carcinogenesis including hypoxia in glioma and angiogenesis in renal cell carcinoma. Our results provide new insights into the link between histopathological and molecular data.

    View details for DOI 10.1038/s41525-020-0120-9

    View details for PubMedID 33574267

  • Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples. GigaScience Zheng, H. n., Brennan, K. n., Hernaez, M. n., Gevaert, O. n. 2019; 8 (12)


    Long non-coding RNAs (lncRNAs) are emerging as important regulators of various biological processes. While many studies have exploited public resources such as RNA sequencing (RNA-Seq) data in The Cancer Genome Atlas to study lncRNAs in cancer, it is crucial to choose the optimal method for accurate expression quantification.In this study, we compared the performance of pseudoalignment methods Kallisto and Salmon, alignment-based transcript quantification method RSEM, and alignment-based gene quantification methods HTSeq and featureCounts, in combination with read aligners STAR, Subread, and HISAT2, in lncRNA quantification, by applying them to both un-stranded and stranded RNA-Seq datasets. Full transcriptome annotation, including protein-coding and non-coding RNAs, greatly improves the specificity of lncRNA expression quantification. Pseudoalignment methods and RSEM outperform HTSeq and featureCounts for lncRNA quantification at both sample- and gene-level comparison, regardless of RNA-Seq protocol type, choice of aligners, and transcriptome annotation. Pseudoalignment methods and RSEM detect more lncRNAs and correlate highly with simulated ground truth. On the contrary, HTSeq and featureCounts often underestimate lncRNA expression. Antisense lncRNAs are poorly quantified by alignment-based gene quantification methods, which can be improved using stranded protocols and pseudoalignment methods.Considering the consistency with ground truth and computational resources, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.

    View details for DOI 10.1093/gigascience/giz145

    View details for PubMedID 31808800

  • Establishment and characterization of new tumor xenografts and cancer cell lines from EBV-positive nasopharyngeal carcinoma. Nature communications Lin, W., Yip, Y. L., Jia, L., Deng, W., Zheng, H., Dai, W., Ko, J. M., Lo, K. W., Chung, G. T., Yip, K. Y., Lee, S., Kwan, J. S., Zhang, J., Liu, T., Chan, J. Y., Kwong, D. L., Lee, V. H., Nicholls, J. M., Busson, P., Liu, X., Chiang, A. K., Hui, K. F., Kwok, H., Cheung, S. T., Cheung, Y. C., Chan, C. K., Li, B., Cheung, A. L., Hau, P. M., Zhou, Y., Tsang, C. M., Middeldorp, J., Chen, H., Lung, M. L., Tsao, S. W. 2018; 9 (1): 4663


    The lack of representative nasopharyngeal carcinoma (NPC) models has seriously hampered research on EBV carcinogenesis and preclinical studies in NPC. Here we report the successful growth of five NPC patient-derived xenografts (PDXs) from fifty-eight attempts of transplantation of NPC specimens into NOD/SCID mice. The take rates for primary and recurrent NPC are 4.9% and 17.6%, respectively. Successful establishment of a new EBV-positive NPC cell line, NPC43, is achieved directly from patient NPC tissues by including Rho-associated coiled-coil containing kinases inhibitor (Y-27632) in culture medium. Spontaneous lytic reactivation of EBV can be observed in NPC43 upon withdrawal of Y-27632. Whole-exome sequencing (WES) reveals a close similarity in mutational profiles of these NPC PDXs with their corresponding patient NPC. Whole-genome sequencing (WGS) further delineates the genomic landscape and sequences of EBV genomes in these newly established NPC models, which supports their potential use in future studies of NPC.

    View details for PubMedID 30405107

  • A radiogenomic dataset of non-small cell lung cancer. Scientific data Bakr, S., Gevaert, O., Echegaray, S., Ayers, K., Zhou, M., Shafiq, M., Zheng, H., Benson, J. A., Zhang, W., Leung, A. N., Kadoch, M., D Hoang, C., Shrager, J., Quon, A., Rubin, D. L., Plevritis, S. K., Napel, S. 2018; 5: 180202


    Medical image biomarkers of cancer promise improvements in patient care through advances in precision medicine. Compared to genomic biomarkers, image biomarkers provide the advantages of being non-invasive, and characterizing a heterogeneous tumor in its entirety, as opposed to limited tissue available via biopsy. We developed a unique radiogenomic dataset from a Non-Small Cell Lung Cancer (NSCLC) cohort of 211 subjects. The dataset comprises Computed Tomography (CT), Positron Emission Tomography (PET)/CT images, semantic annotations of the tumors as observed on the medical images using a controlled vocabulary, and segmentation maps of tumors in the CT scans. Imaging data are also paired with results of gene mutation analyses, gene expression microarrays and RNA sequencing data from samples of surgically excised tumor tissue, and clinical data, including survival outcomes. This dataset was created to facilitate the discovery of the underlying relationship between tumor molecular and medical image features, as well as the development and evaluation of prognostic medical image biomarkers.

    View details for PubMedID 30325352

  • Whole-exome sequencing reveals critical genes underlying metastasis in esophageal squamous cell carcinoma. The Journal of pathology Dai, W., Ko, J. M., Choi, S. S., Yu, Z., Ning, L., Zheng, H., Gopalan, V., Chan, K. T., Lee, N. P., Chan, K. W., Law, S. Y., Lam, A. K., Lung, M. L. 2017


    Esophageal squamous cell carcinoma (ESCC) is one of the most lethal cancers due to a high frequency of metastasis. However, little is known about the genomic landscape of metastatic ESCC. To identify the genetic alterations that underlie ESCC metastasis, whole-exome sequencing (WES) was performed for 41 primary tumors and 15 lymph nodes (LNs) with metastatic ESCC. Eleven cases included matched primary tumors, synchronous LN metastases and non-neoplastic mucosa. Approximately 50-76% of the mutations identified in primary tumors appeared in the synchronous LN metastases. Metastatic ESCC harbor frequent mutations of TP53, KMT2D, ZNF750, and IRF5. Importantly, ZNF750 was recurrently mutated in metastatic ESCC. Combined analysis from current and previous genomic ESCC studies indicated more frequent ZNF750 mutation occurred in diagnosed cases with LN metastasis than those without metastasis (14% vs. 3.4%, n = 629, p = 1.78 × 10(-5)) ). The Cancer Genome Atlas (TCGA) data further showed that ZNF750 genetic alterations were associated with early disease relapse. Previous ESCC studies demonstrated that ZNF750 knockdown strongly promotes proliferation, migration and invasion. Collectively, these results suggest a role for ZNF750 as a metastasis suppressor. TP53 is highly mutated in ESCC and missense mutations are associated with poor overall survival, independent of pathological stage, suggesting these missense mutations have important functional impact for tumor progression, and are, thus, likely to be gain-of-function (GOF) mutations. Additionally, mutations of epigenetic regulators, including KMT2D, TET2 and KAT2A, and chromosomal 6p22 and 11q23 deletions of histone variants, important for nucleosome assembly, were detected in 80% of LN metastases. Our study highlights the important role of critical genetic events including ZNF750 mutations, TP53 putative GOF mutations and nucleosome disorganization caused by genetic lesions seen with ESCC metastasis.

    View details for DOI 10.1002/path.4925

    View details for PubMedID 28608921

  • Whole-exome sequencing identifies multiple loss-of-function mutations of NF-kappa B pathway regulators in nasopharyngeal carcinoma PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zheng, H., Dai, W., Cheung, A. K., Ko, J. M., Kan, R., Wong, B. W., Leong, M. M., Deng, M., Kwok, T. C., Chan, J. Y., Kwong, D. L., Lee, A. W., Ng, W. T., Ngan, R. K., Yau, C. C., Tun, S., Lee, V. H., Lam, k., Kwan, C. K., Li, W. S., Yau, S., Chan, K., Lung, M. L. 2016; 113 (40): 11283-11288


    Nasopharyngeal carcinoma (NPC) is an epithelial malignancy with a unique geographical distribution. The genomic abnormalities leading to NPC pathogenesis remain unclear. In total, 135 NPC tumors were examined to characterize the mutational landscape using whole-exome sequencing and targeted resequencing. An APOBEC cytidine deaminase mutagenesis signature was revealed in the somatic mutations. Noticeably, multiple loss-of-function mutations were identified in several NF-κB signaling negative regulators NFKBIA, CYLD, and TNFAIP3 Functional studies confirmed that inhibition of NFKBIA had a significant impact on NF-κB activity and NPC cell growth. The identified loss-of-function mutations in NFKBIA leading to protein truncation contributed to the altered NF-κB activity, which is critical for NPC tumorigenesis. In addition, somatic mutations were found in several cancer-relevant pathways, including cell cycle-phase transition, cell death, EBV infection, and viral carcinogenesis. These data provide an enhanced road map for understanding the molecular basis underlying NPC.

    View details for DOI 10.1073/pnas.1607606113

    View details for Web of Science ID 000384528900070

    View details for PubMedID 27647909

    View details for PubMedCentralID PMC5056105

  • Genetic and epigenetic landscape of nasopharyngeal carcinoma. Chinese clinical oncology Dai, W., Zheng, H., Cheung, A. K., Lung, M. L. 2016; 5 (2): 16-?


    Nasopharyngeal carcinoma (NPC) is a unique epithelial malignancy that shows a remarkable geographical and ethic distribution. Multiple factors including predisposing genetic factors, environmental carcinogens, and Epstein-Barr virus (EBV) infection contribute to the accumulation of genetic and epigenetic alterations leading to NPC development. Emerging technologies now allow us to detailedly characterize and understand cancer genomes. Genome-wide studies show that typically NPC tumors are characterized as having comparatively low mutation rates, widespread hypermethylation, and frequent copy number alterations and chromosome abnormalities. In this review, we provide an updated overview of the genetic and epigenetic aberrations that likely drive nasopharyngeal tumor development and progression. We integrate the previous knowledge and novel findings from whole-exome sequencing (WES) and methylome studies in NPC, and further discuss the potential use of these findings to identify biomarkers for NPC diagnosis and prognosis.

    View details for DOI 10.21037/cco.2016.03.06

    View details for PubMedID 27121876

  • Whole-exome sequencing identifies MST1R as a genetic susceptibility gene in nasopharyngeal carcinoma PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Dai, W., Zheng, H., Cheung, A. K., Tang, C. S., Ko, J. M., Wong, B. W., Leong, M. M., Sham, P. C., Cheung, F., Kwong, D. L., Ngan, R. K., Ng, W. T., Yau, C. C., Pan, J., Peng, X., Tung, S., Zhang, Z., Ji, M., Chiang, A. K., Lee, A. W., Lee, V. H., Lam, k., Au, K. H., Cheng, H. C., Yiu, H. H., Lung, M. L. 2016; 113 (12): 3317-3322


    Multiple factors, including host genetics, environmental factors, and Epstein-Barr virus (EBV) infection, contribute to nasopharyngeal carcinoma (NPC) development. To identify genetic susceptibility genes for NPC, a whole-exome sequencing (WES) study was performed in 161 NPC cases and 895 controls of Southern Chinese descent. The gene-based burden test discovered an association between macrophage-stimulating 1 receptor (MST1R) and NPC. We identified 13 independent cases carrying the MST1R pathogenic heterozygous germ-line variants, and 53.8% of these cases were diagnosed with NPC aged at or even younger than 20 y, indicating that MST1R germline variants are relevant to disease early-age onset (EAO) (age of ≤20 y). In total, five MST1R missense variants were found in EAO cases but were rare in controls (EAO vs. control, 17.9% vs. 1.2%, P = 7.94 × 10(-12)). The validation study, including 2,160 cases and 2,433 controls, showed that the MST1R variant c.G917A:p.R306H is highly associated with NPC (odds ratio of 9.0). MST1R is predominantly expressed in the tissue-resident macrophages and is critical for innate immunity that protects organs from tissue damage and inflammation. Importantly, MST1R expression is detected in the ciliated epithelial cells in normal nasopharyngeal mucosa and plays a role in the cilia motility important for host defense. Although no somatic mutation of MST1R was identified in the sporadic NPC tumors, copy number alterations and promoter hypermethylation at MST1R were often observed. Our findings provide new insights into the pathogenesis of NPC by highlighting the involvement of the MST1R-mediated signaling pathways.

    View details for DOI 10.1073/pnas.1523436113

    View details for Web of Science ID 000372488200060

    View details for PubMedID 26951679

    View details for PubMedCentralID PMC4812767

  • Comparative methylome analysis in solid tumors reveals aberrant methylation at chromosome 6p in nasopharyngeal carcinoma CANCER MEDICINE Dai, W., Cheung, A. K., Ko, J. M., Cheng, Y., Zheng, H., Ngan, R. K., Ng, W. T., Lee, A. W., Yau, C. C., Lee, V. H., Lung, M. L. 2015; 4 (7): 1079-1090


    Altered patterns of DNA methylation are key features of cancer. Nasopharyngeal carcinoma (NPC) has the highest incidence in Southern China. Aberrant methylation at the promoter region of tumor suppressors is frequently reported in NPC; however, genome-wide methylation changes have not been comprehensively investigated. Therefore, we systematically analyzed methylome data in 25 primary NPC tumors and nontumor counterparts using a high-throughput approach with the Illumina HumanMethylation450 BeadChip. Comparatively, we examined the methylome data of 11 types of solid tumors collected by The Cancer Genome Atlas (TCGA). In NPC, the hypermethylation pattern was more dominant than hypomethylation and the majority of de novo methylated loci were within or close to CpG islands in tumors. The comparative methylome analysis reveals hypermethylation at chromosome 6p21.3 frequently occurred in NPC (false discovery rate; FDR=1.33 × 10(-9) ), but was less obvious in other types of solid tumors except for prostate and Epstein-Barr virus (EBV)-positive gastric cancer (FDR<10(-3) ). Bisulfite pyrosequencing results further confirmed the aberrant methylation at 6p in an additional patient cohort. Evident enrichment of the repressive mark H3K27me3 and active mark H3K4me3 derived from human embryonic stem cells were found at these regions, indicating both DNA methylation and histone modification function together, leading to epigenetic deregulation in NPC. Our study highlights the importance of epigenetic deregulation in NPC. Polycomb Complex 2 (PRC2), responsible for H3K27 trimethylation, is a promising therapeutic target. A key genomic region on 6p with aberrant methylation was identified. This region contains several important genes having potential use as biomarkers for NPC detection.

    View details for DOI 10.1002/cam4.451

    View details for Web of Science ID 000357899100013

    View details for PubMedID 25924914

    View details for PubMedCentralID PMC4529346

  • Viral-Inducible Argonaute18 Confers Broad-Spectrum Virus Resistance in Rice by Sequestering A Host MicroRNA ELIFE Wu, J., Yang, Z., Wang, Y., Zheng, L., Ye, R., Ji, Y., Zhao, S., Ji, S., Liu, R., Xu, L., Zheng, H., Zhou, Y., Zhang, X., Cao, X., Xie, L., Wu, Z., Qi, Y., Li, Y. 2015; 4


    Viral pathogens are a major threat to rice production worldwide. Although RNA interference (RNAi) is known to mediate antiviral immunity in plant and animal models, the mechanism of antiviral RNAi in rice and other economically important crops is poorly understood. Here, we report that rice resistance to evolutionarily diverse viruses requires Argonaute18 (AGO18). Genetic studies reveal that the antiviral function of AGO18 depends on its activity to sequester microRNA168 (miR168) to alleviate repression of rice AGO1 essential for antiviral RNAi. Expression of miR168-resistant AGO1a in ago18 background rescues or increases rice antiviral activity. Notably, stable transgenic expression of AGO18 confers broad-spectrum virus resistance in rice. Our findings uncover a novel cooperative antiviral activity of two distinct AGO proteins and suggest a new strategy for the control of viral diseases in rice.

    View details for DOI 10.7554/eLife.05733

    View details for Web of Science ID 000349462700004

    View details for PubMedID 25688565

    View details for PubMedCentralID PMC4358150

  • RNA-dependent RNA polymerase 6 of rice (Oryza sativa) plays role in host defense against negative-strand RNA virus, Rice stripe virus VIRUS RESEARCH Jiang, L., Qian, D., Zheng, H., Meng, L., Chen, J., Le, W., Zhou, T., Zhou, Y., Wei, C., Li, Y. 2012; 163 (2): 512-519


    RNA-dependent RNA polymerases (RDRs) from fungi, plants and some invertebrate animals play fundamental roles in antiviral defense. Here, we investigated the role of RDR6 in the defense of economically important rice plants against a negative-strand RNA virus (Rice stripe virus, RSV) that causes enormous crop damage. In three independent transgenic lines (OsRDR6AS line A, B and C) in which OsRDR6 transcription levels were reduced by 70-80% through antisense silencing, the infection and disease symptoms of RSV were shown to be significantly enhanced. The hypersusceptibilities of the OsRDR6AS plants were attributed not to enhanced insect infestation but to enhanced virus infection. The rise in symptoms was associated with the increased accumulation of RSV genomic RNA in the OsRDR6AS plants. The deep sequencing data showed reduced RSV-derived siRNA accumulation in the OsRDR6AS plants compared with the wild type plants. This is the first report of the antiviral role of a RDR in a monocot crop plant in the defense against a negative-strand RNA virus and significantly expands upon the current knowledge of the antiviral roles of RDRs in the defense against different types of viral genomes in numerous groups of plants.

    View details for DOI 10.1016/j.virusres.2011.11.016

    View details for Web of Science ID 000301309400013

    View details for PubMedID 22142475

  • Viral Infection Induces Expression of Novel Phased MicroRNAs from Conserved Cellular MicroRNA Precursors PLOS PATHOGENS Du, P., Wu, J., Zhang, J., Zhao, S., Zheng, H., Gao, G., Wei, L., Li, Y. 2011; 7 (8)


    RNA silencing, mediated by small RNAs including microRNAs (miRNAs) and small interfering RNAs (siRNAs), is a potent antiviral or antibacterial mechanism, besides regulating normal cellular gene expression critical for development and physiology. To gain insights into host small RNA metabolism under infections by different viruses, we used Solexa/Illumina deep sequencing to characterize the small RNA profiles of rice plants infected by two distinct viruses, Rice dwarf virus (RDV, dsRNA virus) and Rice stripe virus (RSV, a negative sense and ambisense RNA virus), respectively, as compared with those from non-infected plants. Our analyses showed that RSV infection enhanced the accumulation of some rice miRNA*s, but not their corresponding miRNAs, as well as accumulation of phased siRNAs from a particular precursor. Furthermore, RSV infection also induced the expression of novel miRNAs in a phased pattern from several conserved miRNA precursors. In comparison, no such changes in host small RNA expression was observed in RDV-infected rice plants. Significantly RSV infection elevated the expression levels of selective OsDCLs and OsAGOs, whereas RDV infection only affected the expression of certain OsRDRs. Our results provide a comparative analysis, via deep sequencing, of changes in the small RNA profiles and in the genes of RNA silencing machinery induced by different viruses in a natural and economically important crop host plant. They uncover new mechanisms and complexity of virus-host interactions that may have important implications for further studies on the evolution of cellular small RNA biogenesis that impact pathogen infection, pathogenesis, as well as organismal development.

    View details for DOI 10.1371/journal.ppat.1002176

    View details for Web of Science ID 000294298100019

    View details for PubMedID 21901091

    View details for PubMedCentralID PMC3161970