All Publications


  • Metabolic pathway alterations in cerebrospinal fluid as diagnostic biomarkers for primary central nervous system lymphoma. Clinica chimica acta; international journal of clinical chemistry Ma, J., Wang, D., Li, X., Zhang, Y., Tang, Q., Ding, Y., Li, X., Jin, B., Luo, R. Y., Thyparambil, S., Han, Z., Chou, C. J., Zhou, A., Schilling, J., Lin, Z., Ma, Y., Li, Q., Zhang, M., Sylvester, K. G., Nagpal, S., McElhinney, D. B., Ling, X. B., Chen, B. 2025: 120377

    Abstract

    Primary Central Nervous System Lymphoma (PCNSL) is a rare and aggressive type of hematological malignancy that can pose diagnostic challenges. Early detection is critical for effective treatment and better patient outcomes. The goal of this study was to assess the potential of metabolic pathway alterations as diagnostic and differential biomarker. We conducted a metabolomics analysis from GEO transcriptomic datasets on brain/lymph nodes. Enriched and significant pathways were validated from patient's CSF samples from PCNSL, metastatic cancers and non-malignant controls, with mass spectrometry. Next, we utilized machine learning models to assess the separation performance of PCNSLs from other patients and develop diagnostic and differential diagnosis panels. Key metabolic pathways were discovered from GEO datasets analysis and significantly enriched in the CSF of PCNSL. Porphyrin metabolism and fatty acid-related pathways were significantly enriched from diagnostic panel and AUC was 0.88. Additionally, aminoacyl-tRNA biosynthesis, glutathione metabolism, and several amino acid pathways were significantly enriched from differential panel and the AUC was 0.95. Our study highlights the diagnostic biomarker potential of metabolic pathway alterations in CSF for PCNSL, which could lead to the development of non-invasive and reliable diagnostic tool for PCNSL.

    View details for DOI 10.1016/j.cca.2025.120377

    View details for PubMedID 40398555

  • Intravenous Immunoglobulin Alone for Coronary Artery Lesion Treatment of Kawasaki Disease: A Randomized Clinical Trial. JAMA network open Kuo, H. C., Lin, M. C., Kao, C. C., Weng, K. P., Ding, Y., Han, Z., Chen, C. J., Jan, S. L., Chien, K. J., Ko, C. H., Lin, C. Y., Lei, W. T., Guo, M. M., Yang, K. D., Sylvester, K. G., Whitin, J. C., Tian, L., Chubb, H., Ceresnak, S. R., McElhinney, D., Cohen, H. J., Ling, X. B. 2025; 8 (4): e253063

    Abstract

    Aspirin (acetylsalicylic acid) and intravenous immunoglobulin (IVIG) are standard treatments for Kawasaki disease (KD) to reduce coronary artery lesions (CALs). However, the optimal duration and dosage of aspirin remain inconsistent across hospitals. The absence of large-scale, multicenter randomized clinical trials hinders a clear understanding of the effectiveness of high-dose aspirin.To evaluate the effectiveness of IVIG alone compared with IVIG combined with high-dose aspirin as the active interventional therapy for KD and to compare treatment effectiveness across various KD subgroups.In this prospective, evaluator-blinded, multicenter noninferiority randomized clinical trial, children (aged <6 years) who had been diagnosed with KD according to American Heart Association criteria were recruited from 5 medical centers in Taiwan and were enrolled between September 1, 2016, and August 31, 2018, with follow-up assessments at 6 weeks and 6 months after treatment. Data were analyzed between January 23, 2023, and January 29, 2024.The standard group received IVIG (2 g/kg) plus high-dose aspirin (80-100 mg/kg per day) until fever subsided for 48 hours. The intervention group received IVIG (2 g/kg) alone.The primary outcome was the occurrence of CALs at 6 weeks. The noninferiority margin was set at 10%. Data analysis was performed using χ2 tests for categorical variables; independent t tests for continuous, normally distributed variables; generalized estimating equations for variables without specific distributions at multiple time points; and repeated-measures analysis of variance for continuous variables at multiple time points.The final cohort consisted of 134 patients with KD (mean [SD] age, 1.8 [1.3] years; 82 males [61.2%]), with matched age, weight, height, and sex distributions in 2 groups. Overall, in the IVIG plus aspirin group, among 69 patients, CAL occurrence decreased from 9 (13.0%) at baseline to 2 (2.9%) at 6 weeks and to 1 (1.4%) at 6 months. In the IVIG-only group, among 65 patients, CAL occurrence decreased from 7 (10.8%) at diagnosis to 1 (1.5%) at 6 weeks and to 2 (3.1%) at 6 months. No statistically significant differences in CAL frequency were observed between the 2 groups (0.7 percentage points [95% CI, -4.5 to 5.8 percentage points]; P = .65). There were also no significant differences in the treatment or prophylactic effect.This randomized clinical trial demonstrated the noninferiority of IVIG alone compared with IVIG plus aspirin, with a noninferiority margin set at 10%. The findings suggest that addition of high-dose aspirin during initial IVIG treatment is not clinically meaningful for CAL reduction in children with KD. Future studies on IVIG treatment alone for CAL reduction in KD across diverse racial and ethnic groups, beyond the Asian population, may be necessary to confirm minimal racial and ethnic variability and the broad applicability of these findings.ClinicalTrials.gov Identifier: NCT02951234.

    View details for DOI 10.1001/jamanetworkopen.2025.3063

    View details for PubMedID 40178858

    View details for PubMedCentralID PMC11969286

  • Prediction of risk for early or very early preterm births using high-resolution urinary metabolomic profiling. BMC pregnancy and childbirth Zhang, Y., Sylvester, K. G., Wong, R. J., Blumenfeld, Y. J., Hwa, K. Y., Chou, C. J., Thyparambil, S., Liao, W., Han, Z., Schilling, J., Jin, B., Marić, I., Aghaeepour, N., Angst, M. S., Gaudilliere, B., Winn, V. D., Shaw, G. M., Tian, L., Luo, R. Y., Darmstadt, G. L., Cohen, H. J., Stevenson, D. K., McElhinney, D. B., Ling, X. B. 2024; 24 (1): 783

    Abstract

    Preterm birth (PTB) is a serious health problem. PTB complications is the main cause of death in infants under five years of age worldwide. The ability to accurately predict risk for PTB during early pregnancy would allow early monitoring and interventions to provide personalized care, and hence improve outcomes for the mother and infant.This study aims to predict the risks of early preterm (< 35 weeks of gestation) or very early preterm (≤ 26 weeks of gestation) deliveries by using high-resolution maternal urinary metabolomic profiling in early pregnancy.A retrospective cohort study was conducted by two independent preterm and term cohorts using high-density weekly urine sampling. Maternal urine was collected serially at gestational weeks 8 to 24. Global metabolomics approaches were used to profile urine samples with high-resolution mass spectrometry. The significant features associated with preterm outcomes were selected by Gini Importance. Metabolite biomarker identification was performed by liquid chromatography tandem mass spectrometry (LCMS-MS). XGBoost models were developed to predict early or very early preterm delivery risk.The urine samples included 329 samples from 30 subjects at Stanford University, CA for model development, and 156 samples from 24 subjects at the University of Alabama, Birmingham, AL for validation.12 metabolites associated with PTB were selected and identified for modelling among 7,913 metabolic features in serial-collected urine samples of pregnant women. The model to predict early PTB was developed using a set of 12 metabolites that resulted in the area under the receiver operating characteristic (AUROCs) of 0.995 (95% CI: [0.992, 0.995]) and 0.964 (95% CI: [0.937, 0.964]), and sensitivities of 100% and 97.4% during development and validation testing, respectively. Using the same metabolites, the very early PTB prediction model achieved AUROCs of 0.950 (95% CI: [0.878, 0.950]) and 0.830 (95% CI: [0.687, 0.826]), and sensitivities of 95.0% and 60.0% during development and validation, respectively.Models for predicting risk of early or very early preterm deliveries were developed and tested using metabolic profiling during the 1st and 2nd trimesters of pregnancy. With patient validation studies, risk prediction models may be used to identify at-risk pregnancies prompting alterations in clinical care, and to gain biological insights of preterm birth.

    View details for DOI 10.1186/s12884-024-06974-2

    View details for PubMedID 39587571

    View details for PubMedCentralID PMC11587579

  • Targeted Multiplex Proteomics for the Development and Validation of Biomarkers in Primary Aldosteronism Subtyping. European journal of endocrinology Zhou, F., Ding, Y., Chen, T., Tang, Q., Zhang, J., Thyparambil, S., Jin, B., Han, Z., Chou, C. J., Schilling, J., Luo, R. Y., Tian, H., Sylvester, K. G., Whitin, J. C., Cohen, H. J., McElhinney, D. B., Tian, L., Ling, X. B., Ren, Y. 2024

    Abstract

    Primary aldosteronism (PA), a significant cause of secondary hypertension affecting approximately 10% of patients with severe hypertension, exacerbates cardiovascular and cerebrovascular complications even after blood pressure control. PA is categorized into two main subtypes: unilateral aldosterone-producing adenomas (APA) and bilateral hyperaldosteronism (BHA), each requiring distinct treatment approaches. Accurate subtype classification is crucial for selecting the most effective treatment. The goal of this study was to develop novel blood-based proteomic biomarkers to differentiate between APA and BHA subtypes in patients with PA.Five subtyping differential protein biomarker candidates (APOC3, CD56, CHGA, KRT5, and AZGP1) were identified through targeted proteomic profiling of plasma. The subtyping efficiency of these biomarkers was assessed at both the tissue gene expression and blood protein expression levels. To explore the underlying biology of APA and BHA, significant differential pathways were investigated.The five-protein panel proved highly effective in distinguishing APA from BHA in both tissue and blood samples. By integrating these five protein biomarkers with aldosterone and renin, our blood-based predictive methods achieved remarkable ROC AUCs of 0.986 (95% CI: 0.963-1.000) for differentiating essential hypertension (EH) from PA, and 0.922 (95% CI: 0.846-0.998) for subtyping APA versus BHA. These outcomes surpass the performance of the existing Kobayashi score subtyping system. Furthermore, the study validated differential pathways associated with the pathophysiology of primary aldosteronism, aligning with current scientific knowledge and opening new avenues for advancing PA care.The new blood-based biomarkers for PA subtyping hold the potential to significantly enhance clinical utility and advance the practice of PA care.

    View details for DOI 10.1093/ejendo/lvae148

    View details for PubMedID 39556467

  • Mitochondrial Uncoupler and Retinoic Acid Synergistically Induce Differentiation in Neuroblastoma Liang, N., Jiang, H., Tiche, S. J., Han, Z., Ling, X. B., Shimada, H., Ye, J., Chiu, B. LIPPINCOTT WILLIAMS & WILKINS. 2024: S354
  • Targeted multiplex validation of CSF proteomic biomarkers: implications for differentiation of PCNSL from tumor-free controls and other brain tumors. Frontiers in immunology Ma, J., Lin, Z., Zhang, Y., Ding, Y., Tang, Q., Qian, Y., Jin, B., Luo, R. Y., Liao, W. L., Thyparambil, S., Han, Z., Chou, C. J., Schilling, J., Li, Q., Zhang, M., Lin, Y., Ma, Y., Sylvester, K. G., Nagpal, S., McElhinney, D. B., Ling, X. B., Chen, B. 2024; 15: 1343109

    Abstract

    Primary central nervous system lymphoma (PCNSL) is a rare type of non-Hodgkin's lymphoma that affects brain parenchyma, eyes, cerebrospinal fluid, and spinal cord. Diagnosing PCNSL can be challenging because imaging studies often show similar patterns as other brain tumors, and stereotactic brain lesion biopsy conformation is invasive and not always possible. This study aimed to validate a previous proteomic profiling (PMID: 32610669) of cerebrospinal fluid (CSF) and develop a CSF-based proteomic panel for accurate PCNSL diagnosis and differentiation.CSF samples were collected from patients of 30 PCNSL, 30 other brain tumors, and 31 tumor-free/benign controls. Liquid chromatography tandem-mass spectrometry targeted proteomics analysis was used to establish CSF-based proteomic panels.Final proteomic panels were selected and optimized to diagnose PCNSL from tumor-free controls or other brain tumor lesions with an area under the curve (AUC) of 0.873 (95%CI: 0.723-0.948) and 0.937 (95%CI: 0.807- 0.985), respectively. Pathways analysis showed diagnosis panel features were significantly enriched in pathways related to extracellular matrices-receptor interaction, focal adhesion, and PI3K-Akt signaling, while prion disease, mineral absorption and HIF-1 signaling were significantly enriched with differentiation panel features.This study suggests an accurate clinical test panel for PCNSL diagnosis and differentiation with CSF-based proteomic signatures, which may help overcome the challenges of current diagnostic methods and improve patient outcomes.

    View details for DOI 10.3389/fimmu.2024.1343109

    View details for PubMedID 39144147

    View details for PubMedCentralID PMC11322575

  • Global metabolomics revealed deviations from the metabolic aging clock in colorectal cancer patients. Theranostics Zhang, L., Mo, S., Zhu, X., Chou, C. J., Jin, B., Han, Z., Schilling, J., Liao, W., Thyparambil, S., Luo, R. Y., Whitin, J. C., Tian, L., Nagpal, S., Ceresnak, S. R., Cohen, H. J., McElhinney, D. B., Sylvester, K. G., Gong, Y., Fu, C., Ling, X. B., Peng, J. 2024; 14 (4): 1602-1614

    Abstract

    Background: Markers of aging hold promise in the context of colorectal cancer (CRC) care. Utilizing high-resolution metabolomic profiling, we can unveil distinctive age-related patterns that have the potential to predict early CRC development. Our study aims to unearth a panel of aging markers and delve into the metabolomic alterations associated with aging and CRC. Methods: We assembled a serum cohort comprising 5,649 individuals, consisting of 3,002 healthy volunteers, 715 patients diagnosed with colorectal advanced precancerous lesions (APL), and 1,932 CRC patients, to perform a comprehensive metabolomic analysis. Results: We successfully identified unique age-associated patterns across 42 metabolic pathways. Moreover, we established a metabolic aging clock, comprising 9 key metabolites, using an elastic net regularized regression model that accurately estimates chronological age. Notably, we observed significant chronological disparities among the healthy population, APL patients, and CRC patients. By combining the analysis of circulative carcinoembryonic antigen levels with the categorization of individuals into the "hypo" metabolic aging subgroup, our blood test demonstrates the ability to detect APL and CRC with positive predictive values of 68.4% (64.3%, 72.2%) and 21.4% (17.8%, 25.9%), respectively. Conclusions: This innovative approach utilizing our metabolic aging clock holds significant promise for accurately assessing biological age and enhancing our capacity to detect APL and CRC.

    View details for DOI 10.7150/thno.87303

    View details for PubMedID 38389840

    View details for PubMedCentralID PMC10879879

  • Exploring the feasibility of using long-term stored newborn dried blood spots to identify metabolic features for congenital heart disease screening. Biomarker research Ceresnak, S. R., Zhang, Y., Ling, X. B., Su, K. J., Tang, Q., Jin, B., Schilling, J., Chou, C. J., Han, Z., Floyd, B. J., Whitin, J. C., Hwa, K. Y., Sylvester, K. G., Chubb, H., Luo, R. Y., Tian, L., Cohen, H. J., McElhinney, D. B. 2023; 11 (1): 97

    Abstract

    Congenital heart disease (CHD) represents a significant contributor to both morbidity and mortality in neonates and children. There's currently no analogous dried blood spot (DBS) screening for CHD immediately after birth. This study was set to assess the feasibility of using DBS to identify reliable metabolite biomarkers with clinical relevance, with the aim to screen and classify CHD utilizing the DBS. We assembled a cohort of DBS datasets from the California Department of Public Health (CDPH) Biobank, encompassing both normal controls and three pre-defined CHD categories. A DBS-based quantitative metabolomics method was developed using liquid chromatography with tandem mass spectrometry (LC-MS/MS). We conducted a correlation analysis comparing the absolute quantitated metabolite concentration in DBS against the CDPH NBS records to verify the reliability of metabolic profiling. For hydrophilic and hydrophobic metabolites, we executed significant pathway and metabolite analyses respectively. Logistic and LightGBM models were established to aid in CHD discrimination and classification. Consistent and reliable quantification of metabolites were demonstrated in DBS samples stored for up to 15 years. We discerned dysregulated metabolic pathways in CHD patients, including deviations in lipid and energy metabolism, as well as oxidative stress pathways. Furthermore, we identified three metabolites and twelve metabolites as potential biomarkers for CHD assessment and subtypes classifying. This study is the first to confirm the feasibility of validating metabolite profiling results using long-term stored DBS samples. Our findings highlight the potential clinical applications of our DBS-based methods for CHD screening and subtype classification.

    View details for DOI 10.1186/s40364-023-00536-y

    View details for PubMedID 37957758

    View details for PubMedCentralID PMC10644604

  • High-throughput quantitation of amino acids and acylcarnitine in cerebrospinal fluid: identification of PCNSL biomarkers and potential metabolic messengers FRONTIERS IN MOLECULAR BIOSCIENCES Ma, J., Chen, K., Ding, Y., Li, X., Tang, Q., Jin, B., Luo, R. Y., Thyparambil, S., Han, Z., Chou, C., Zhou, A., Schilling, J., Lin, Z., Ma, Y., Li, Q., Zhang, M., Sylvester, K. G., Nagpal, S., McElhinney, D. B., Ling, X. B., Chen, B. 2023; 10
  • High-throughput quantitation of amino acids and acylcarnitine in cerebrospinal fluid: identification of PCNSL biomarkers and potential metabolic messengers. Frontiers in molecular biosciences Ma, J., Chen, K., Ding, Y., Li, X., Tang, Q., Jin, B., Luo, R. Y., Thyparambil, S., Han, Z., Chou, C. J., Zhou, A., Schilling, J., Lin, Z., Ma, Y., Li, Q., Zhang, M., Sylvester, K. G., Nagpal, S., McElhinney, D. B., Ling, X. B., Chen, B. 2023; 10: 1257079

    Abstract

    Background: Due to the poor prognosis and rising occurrence, there is a crucial need to improve the diagnosis of Primary Central Nervous System Lymphoma (PCNSL), which is a rare type of non-Hodgkin's lymphoma. This study utilized targeted metabolomics of cerebrospinal fluid (CSF) to identify biomarker panels for the improved diagnosis or differential diagnosis of primary central nervous system lymphoma (PCNSL). Methods: In this study, a cohort of 68 individuals, including patients with primary central nervous system lymphoma (PCNSL), non-malignant disease controls, and patients with other brain tumors, was recruited. Their cerebrospinal fluid samples were analyzed using the Ultra-high performance liquid chromatography - tandem mass spectrometer (UHPLC-MS/MS) technique for targeted metabolomics analysis. Multivariate statistical analysis and logistic regression modeling were employed to identify biomarkers for both diagnosis (Dx) and differential diagnosis (Diff) purposes. The Dx and Diff models were further validated using a separate cohort of 34 subjects through logistic regression modeling. Results: A targeted analysis of 45 metabolites was conducted using UHPLC-MS/MS on cerebrospinal fluid (CSF) samples from a cohort of 68 individuals, including PCNSL patients, non-malignant disease controls, and patients with other brain tumors. Five metabolic features were identified as biomarkers for PCNSL diagnosis, while nine metabolic features were found to be biomarkers for differential diagnosis. Logistic regression modeling was employed to validate the Dx and Diff models using an independent cohort of 34 subjects. The logistic model demonstrated excellent performance, with an AUC of 0.83 for PCNSL vs. non-malignant disease controls and 0.86 for PCNSL vs. other brain tumor patients. Conclusion: Our study has successfully developed two logistic regression models utilizing metabolic markers in cerebrospinal fluid (CSF) for the diagnosis and differential diagnosis of PCNSL. These models provide valuable insights and hold promise for the future development of a non-invasive and reliable diagnostic tool for PCNSL.

    View details for DOI 10.3389/fmolb.2023.1257079

    View details for PubMedID 38028545

    View details for PubMedCentralID PMC10644155

  • Altered expression of the L-arginine/nitric oxide pathway in ovarian cancer: metabolic biomarkers and biological implications. BMC cancer Chen, L., Tang, Q., Zhang, K., Huang, Q., Ding, Y., Jin, B., Liu, S., Hwa, K., Chou, C. J., Zhang, Y., Thyparambil, S., Liao, W., Han, Z., Mortensen, R., Schilling, J., Li, Z., Heaton, R., Tian, L., Cohen, H. J., Sylvester, K. G., Arent, R. C., Zhao, X., McElhinney, D. B., Wu, Y., Bai, W., Ling, X. B. 2023; 23 (1): 844

    Abstract

    Ovarian cancer (OC) is a highly lethal gynecological malignancy. Extensive research has shown that OC cells undergo significant metabolic alterations during tumorigenesis. In this study, we aim to leverage these metabolic changes as potential biomarkers for assessing ovarian cancer.A functional module-based approach was utilized to identify key gene expression pathways that distinguish different stages of ovarian cancer (OC) within a tissue biopsy cohort. This cohort consisted of control samples (n = 79), stage I/II samples (n = 280), and stage III/IV samples (n = 1016). To further explore these altered molecular pathways, minimal spanning tree (MST) analysis was applied, leading to the formulation of metabolic biomarker hypotheses for OC liquid biopsy. To validate, a multiple reaction monitoring (MRM) based quantitative LCMS/MS method was developed. This method allowed for the precise quantification of targeted metabolite biomarkers using an OC blood cohort comprising control samples (n = 464), benign samples (n = 3), and OC samples (n = 13).Eleven functional modules were identified as significant differentiators (false discovery rate, FDR < 0.05) between normal and early-stage, or early-stage and late-stage ovarian cancer (OC) tumor tissues. MST analysis revealed that the metabolic L-arginine/nitric oxide (L-ARG/NO) pathway was reprogrammed, and the modules related to "DNA replication" and "DNA repair and recombination" served as anchor modules connecting the other nine modules. Based on this analysis, symmetric dimethylarginine (SDMA) and arginine were proposed as potential liquid biopsy biomarkers for OC assessment. Our quantitative LCMS/MS analysis on our OC blood cohort provided direct evidence supporting the use of the SDMA-to-arginine ratio as a liquid biopsy panel to distinguish between normal and OC samples, with an area under the ROC curve (AUC) of 98.3%.Our comprehensive analysis of tissue genomics and blood quantitative LC/MSMS metabolic data shed light on the metabolic reprogramming underlying OC pathophysiology. These findings offer new insights into the potential diagnostic utility of the SDMA-to-arginine ratio for OC assessment. Further validation studies using adequately powered OC cohorts are warranted to fully establish the clinical effectiveness of this diagnostic test.

    View details for DOI 10.1186/s12885-023-11192-8

    View details for PubMedID 37684587

    View details for PubMedCentralID 8192829

  • Development of a Urine Metabolomics Biomarker-Based Prediction Model for Preeclampsia during Early Pregnancy. Metabolites Zhang, Y., Sylvester, K. G., Jin, B., Wong, R. J., Schilling, J., Chou, C. J., Han, Z., Luo, R. Y., Tian, L., Ladella, S., Mo, L., Maric, I., Blumenfeld, Y. J., Darmstadt, G. L., Shaw, G. M., Stevenson, D. K., Whitin, J. C., Cohen, H. J., McElhinney, D. B., Ling, X. B. 2023; 13 (6)

    Abstract

    Preeclampsia (PE) is a condition that poses a significant risk of maternal mortality and multiple organ failure during pregnancy. Early prediction of PE can enable timely surveillance and interventions, such as low-dose aspirin administration. In this study, conducted at Stanford Health Care, we examined a cohort of 60 pregnant women and collected 478 urine samples between gestational weeks 8 and 20 for comprehensive metabolomic profiling. By employing liquid chromatography mass spectrometry (LCMS/MS), we identified the structures of seven out of 26 metabolomics biomarkers detected. Utilizing the XGBoost algorithm, we developed a predictive model based on these seven metabolomics biomarkers to identify individuals at risk of developing PE. The performance of the model was evaluated using 10-fold cross-validation, yielding an area under the receiver operating characteristic curve of 0.856. Our findings suggest that measuring urinary metabolomics biomarkers offers a noninvasive approach to assess the risk of PE prior to its onset.

    View details for DOI 10.3390/metabo13060715

    View details for PubMedID 37367874

  • Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ precision oncology Huang, Z., Shao, W., Han, Z., Alkashash, A. M., De la Sancha, C., Parwani, A. V., Nitta, H., Hou, Y., Wang, T., Salama, P., Rizkalla, M., Zhang, J., Huang, K., Li, Z. 2023; 7 (1): 14

    Abstract

    Advances in computational algorithms and tools have made the prediction of cancer patient outcomes using computational pathology feasible. However, predicting clinical outcomes from pre-treatment histopathologic images remains a challenging task, limited by the poor understanding of tumor immune micro-environments. In this study, an automatic, accurate, comprehensive, interpretable, and reproducible whole slide image (WSI) feature extraction pipeline known as, IMage-based Pathological REgistration and Segmentation Statistics (IMPRESS), is described. We used both H&E and multiplex IHC (PD-L1, CD8+, and CD163+) images, investigated whether artificial intelligence (AI)-based algorithms using automatic feature extraction methods can predict neoadjuvant chemotherapy (NAC) outcomes in HER2-positive (HER2+) and triple-negative breast cancer (TNBC) patients. Features are derived from tumor immune micro-environment and clinical data and used to train machine learning models to accurately predict the response to NAC in breast cancer patients (HER2+ AUC=0.8975; TNBC AUC=0.7674). The results demonstrate that this method outperforms the results trained from features that were manually generated by pathologists. The developed image features and algorithms were further externally validated by independent cohorts, yielding encouraging results, especially for the HER2+ subtype.

    View details for DOI 10.1038/s41698-023-00352-5

    View details for PubMedID 36707660

  • Single center blind testing of a US multi-center validated diagnostic algorithm for Kawasaki disease in Taiwan. Frontiers in immunology Kuo, H. C., Hao, S., Jin, B., Chou, C. J., Han, Z., Chang, L. S., Huang, Y. H., Hwa, K., Whitin, J. C., Sylvester, K. G., Reddy, C. D., Chubb, H., Ceresnak, S. R., Kanegaye, J. T., Tremoulet, A. H., Burns, J. C., McElhinney, D., Cohen, H. J., Ling, X. B. 2022; 13: 1031387

    Abstract

    Kawasaki disease (KD) is the leading cause of acquired heart disease in children. The major challenge in KD diagnosis is that it shares clinical signs with other childhood febrile control (FC) subjects. We sought to determine if our algorithmic approach applied to a Taiwan cohort.A single center (Chang Gung Memorial Hospital in Taiwan) cohort of patients suspected with acute KD were prospectively enrolled by local KD specialists for KD analysis. Our previously single-center developed computer-based two-step algorithm was further tested by a five-center validation in US. This first blinded multi-center trial validated our approach, with sufficient sensitivity and positive predictive value, to identify most patients with KD diagnosed at centers across the US. This study involved 418 KDs and 259 FCs from the Chang Gung Memorial Hospital in Taiwan.Our diagnostic algorithm retained sensitivity (379 of 418; 90.7%), specificity (223 of 259; 86.1%), PPV (379 of 409; 92.7%), and NPV (223 of 247; 90.3%) comparable to previous US 2016 single center and US 2020 fiver center results. Only 4.7% (15 of 418) of KD and 2.3% (6 of 259) of FC patients were identified as indeterminate. The algorithm identified 18 of 50 (36%) KD patients who presented 2 or 3 principal criteria. Of 418 KD patients, 157 were infants younger than one year and 89.2% (140 of 157) were classified correctly. Of the 44 patients with KD who had coronary artery abnormalities, our diagnostic algorithm correctly identified 43 (97.7%) including all patients with dilated coronary artery but one who found to resolve in 8 weeks.This work demonstrates the applicability of our algorithmic approach and diagnostic portability in Taiwan.

    View details for DOI 10.3389/fimmu.2022.1031387

    View details for PubMedID 36263040

    View details for PubMedCentralID PMC9575935

  • Gestational Dating by Urine Metabolic Profile at High Resolution Weekly Sampling Timepoints: Discovery and Validation. Frontiers in molecular medicine Sylvester, K. G., Hao, S., Li, Z., Han, Z., Tian, L., Ladella, S., Wong, R. J., Shaw, G. M., Stevenson, D. K., Cohen, H. J., Whitin, J. C., McElhinney, D. B., Ling, X. B. 2022; 2: 844280

    Abstract

    Background: Pregnancy triggers longitudinal metabolic alterations in women to allow precisely-programmed fetal growth. Comprehensive characterization of such a "metabolic clock" of pregnancy may provide a molecular reference in relation to studies of adverse pregnancy outcomes. However, a high-resolution temporal profile of metabolites along a healthy pregnancy remains to be defined. Methods: Two independent, normal pregnancy cohorts with high-density weekly urine sampling (discovery: 478 samples from 19 subjects at California; validation: 171 samples from 10 subjects at Alabama) were studied. Urine samples were profiled by liquid chromatography-mass spectrometry (LC-MS) for untargeted metabolomics, which was applied for gestational age dating and prediction of time to delivery. Results: 5,473 urinary metabolic features were identified. Partial least-squares discriminant analysis on features with robust signals (n = 1,716) revealed that the samples were distributed on the basis of the first two principal components according to their gestational age. Pathways of bile secretion, steroid hormone biosynthesis, pantohenate, and CoA biosynthesis, benzoate degradation, and phenylpropanoid biosynthesis were significantly regulated, which was collectively applied to discover and validate a predictive model that accurately captures the chronology of pregnancy. With six urine metabolites (acetylcholine, estriol-3-glucuronide, dehydroepiandrosterone sulfate, α-lactose, hydroxyexanoy-carnitine, and l-carnitine), models were constructed based on gradient-boosting decision trees to date gestational age in high accordance with ultrasound results, and to accurately predict time to delivery. Conclusion: Our study characterizes the weekly baseline profile of the human pregnancy metabolome, which provides a high-resolution molecular reference for future studies of adverse pregnancy outcomes.

    View details for DOI 10.3389/fmmed.2022.844280

    View details for PubMedID 39086969

    View details for PubMedCentralID PMC11285704

  • Deviation from the precisely timed age-associated patterns revealed by blood metabolomics to find CRC patients at risk of relapse at the CRC diagnosis Thyparambil, S. P., Zhu, X., Zhang, Y., Sun, H., Peng, J., Cai, S., Li, Y., Fu, C., Bao, P., Hao, S., Li, Z., Ding, Y., Yao, X., Liao, W., Heaton, R., Han, Z., Tian, L., Schilling, J., Sylvester, K. G., Ling, X. LIPPINCOTT WILLIAMS & WILKINS. 2022
  • Serological Phenotyping Analysis Uncovers a Unique Metabolomic Pattern Associated With Early Onset of Type 2 Diabetes Mellitus. Frontiers in molecular biosciences Zhu, L., Huang, Q., Li, X., Jin, B., Ding, Y., Chou, C. J., Su, K., Zhang, Y., Chen, X., Hwa, K. Y., Thyparambil, S., Liao, W., Han, Z., Mortensen, R., Jin, Y., Li, Z., Schilling, J., Li, Z., Sylvester, K. G., Sun, X., Ling, X. B. 2022; 9: 841209

    Abstract

    Background: Type 2 diabetes mellitus (T2DM) is a multifaceted disorder affecting epidemic proportion at global scope. Defective insulin secretion by pancreatic beta-cells and the inability of insulin-sensitive tissues to respond effectively to insulin are the underlying biology of T2DM. However, circulating biomarkers indicative of early diabetic onset at the asymptomatic stage have not been well described. We hypothesized that global and targeted mass spectrometry (MS) based metabolomic discovery can identify novel serological metabolic biomarkers specifically associated with T2DM. We further hypothesized that these markers can have a unique pattern associated with latent or early asymptomatic stage, promising an effective liquid biopsy approach for population T2DM risk stratification and screening. Methods: Four independent cohorts were assembled for the study. The T2DM cohort included sera from 25 patients with T2DM and 25 healthy individuals for the biomarker discovery and sera from 15 patients with T2DM and 15 healthy controls for the testing. The Pre-T2DM cohort included sera from 76 with prediabetes and 62 healthy controls for the model training and sera from 35 patients with prediabetes and 27 healthy controls for the model testing. Both global and targeted (amino acid, acylcarnitine, and fatty acid) approaches were used to deep phenotype the serological metabolome by high performance liquid chromatography-high resolution mass spectrometry. Different machine learning approaches (Random Forest, XGBoost, and ElasticNet) were applied to model the unique T2DM/Pre-T2DM metabolic patterns and contrasted with their effectiness to differentiate T2DM/Pre-T2DM from controls. Results: The univariate analysis identified unique panel of metabolites (n = 22) significantly associated with T2DM. Global metabolomics and subsequent structure determination led to the identification of 8 T2DM biomarkers while targeted LCMS profiling discovered 14 T2DM biomarkers. Our panel can effectively differentiate T2DM (ROC AUC = 1.00) or Pre-T2DM (ROC AUC = 0.84) from the controls in the respective testing cohort. Conclusion: Our serological metabolite panel can be utilized to identifiy asymptomatic population at risk of T2DM, which may provide utility in identifying population at risk at an early stage of diabetic development to allow for clinical intervention. This early detection would guide ehanced levels of care and accelerate development of clinical strategies to prevent T2DM.

    View details for DOI 10.3389/fmolb.2022.841209

    View details for PubMedID 35463946

  • Weakly Supervised Deep Ordinal Cox Model for Survival Prediction From Whole-Slide Pathological Images. IEEE transactions on medical imaging Shao, W., Wang, T., Huang, Z., Han, Z., Zhang, J., Huang, K. 2021; 40 (12): 3739-3747

    Abstract

    Whole-Slide Histopathology Image (WSI) is generally considered the gold standard for cancer diagnosis and prognosis. Given the large inter-operator variation among pathologists, there is an imperative need to develop machine learning models based on WSIs for consistently predicting patient prognosis. The existing WSI-based prediction methods do not utilize the ordinal ranking loss to train the prognosis model, and thus cannot model the strong ordinal information among different patients in an efficient way. Another challenge is that a WSI is of large size (e.g., 100,000-by-100,000 pixels) with heterogeneous patterns but often only annotated with a single WSI-level label, which further complicates the training process. To address these challenges, we consider the ordinal characteristic of the survival process by adding a ranking-based regularization term on the Cox model and propose a weakly supervised deep ordinal Cox model (BDOCOX) for survival prediction from WSIs. Here, we generate amounts of bags from WSIs, and each bag is comprised of the image patches representing the heterogeneous patterns of WSIs, which is assumed to match the WSI-level labels for training the proposed model. The effectiveness of the proposed method is well validated by theoretical analysis as well as the prognosis and patient stratification results on three cancer datasets from The Cancer Genome Atlas (TCGA).

    View details for DOI 10.1109/TMI.2021.3097319

    View details for PubMedID 34264823

  • Applying interpretable deep learning models to identify chronic cough patients using EHR data. Computer methods and programs in biomedicine Luo, X., Gandhi, P., Zhang, Z., Shao, W., Han, Z., Chandrasekaran, V., Turzhitsky, V., Bali, V., Roberts, A. R., Metzger, M., Baker, J., La Rosa, C., Weaver, J., Dexter, P., Huang, K. 2021; 210: 106395

    Abstract

    Chronic cough (CC) affects approximately 10% of adults. Many disease states are associated with chronic cough, such as asthma, upper airway cough syndrome, bronchitis, and gastroesophageal reflux disease. The lack of an ICD code specific for chronic cough makes it challenging to identify such patients from electronic health records (EHRs). For clinical and research purposes, computational methods using EHR data are urgently needed to identify chronic cough cases. This research aims to investigate the data representations and deep learning algorithms for chronic cough prediction.Utilizing real-world EHR data from a large academic healthcare system from October 2005 to September 2015, we investigated Natural Language Representation of the EHR data and systematically evaluated deep learning and traditional machine learning models to predict chronic cough patients. We built these machine learning models using structured data (medication and diagnosis) and unstructured data (clinical notes).The sensitivity and specificity of a transformer-based deep learning algorithm, specifically BERT with attention model, was 0.856 and 0.866, respectively, using structured data (medication and diagnosis). Sensitivity and specificity improved to 0.952 and 0.930 when we combined structured data with symptoms extracted from clinical notes. We further found that the attention mechanism of deep learning models can be used to extract important features that drive the prediction decisions. Compared with our previously published rule-based algorithm, the deep learning algorithm can identify more chronic cough patients with structured data.By applying deep learning models, chronic cough patients can be reliably identified for prospective or retrospective research through medication and diagnosis data, widely available in EHR and electronic claims data, thus improving the generalizability of the patient identification algorithm. Deep learning models can identify chronic cough patients with even higher sensitivity and specificity when structured and unstructured EHR data are utilized. We anticipate language-based data representation and deep learning models developed in this research could also be productively used for other disease prediction and case identification.

    View details for DOI 10.1016/j.cmpb.2021.106395

    View details for PubMedID 34525412

  • Crowdsourcing digital health measures to predict Parkinson's disease severity: the Parkinson's Disease Digital Biomarker DREAM Challenge. NPJ digital medicine Sieberts, S. K., Schaff, J., Duda, M., Pataki, B. Á., Sun, M., Snyder, P., Daneault, J. F., Parisi, F., Costante, G., Rubin, U., Banda, P., Chae, Y., Chaibub Neto, E., Dorsey, E. R., Aydın, Z., Chen, A., Elo, L. L., Espino, C., Glaab, E., Goan, E., Golabchi, F. N., Görmez, Y., Jaakkola, M. K., Jonnagaddala, J., Klén, R., Li, D., McDaniel, C., Perrin, D., Perumal, T. M., Rad, N. M., Rainaldi, E., Sapienza, S., Schwab, P., Shokhirev, N., Venäläinen, M. S., Vergara-Diaz, G., Zhang, Y., Wang, Y., Guan, Y., Brunner, D., Bonato, P., Mangravite, L. M., Omberg, L. 2021; 4 (1): 53

    Abstract

    Consumer wearables and sensors are a rich source of data about patients' daily disease and symptom burden, particularly in the case of movement disorders like Parkinson's disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires sufficient data and unbiased evaluation methods. Here we describe the use of crowdsourcing to specifically evaluate and benchmark features derived from accelerometer and gyroscope data in two different datasets to predict the presence of PD and severity of three PD symptoms: tremor, dyskinesia, and bradykinesia. Forty teams from around the world submitted features, and achieved drastically improved predictive performance for PD status (best AUROC = 0.87), as well as tremor- (best AUPR = 0.75), dyskinesia- (best AUPR = 0.48) and bradykinesia-severity (best AUPR = 0.95).

    View details for DOI 10.1038/s41746-021-00414-7

    View details for PubMedID 33742069

    View details for PubMedCentralID PMC7979931

  • Multi-task multi-modal learning for joint diagnosis and prognosis of human cancers. Medical image analysis Shao, W., Wang, T., Sun, L., Dong, T., Han, Z., Huang, Z., Zhang, J., Zhang, D., Huang, K. 2020; 65: 101795

    Abstract

    With the tremendous development of artificial intelligence, many machine learning algorithms have been applied to the diagnosis of human cancers. Recently, rather than predicting categorical variables (e.g., stages and subtypes) as in cancer diagnosis, several prognosis prediction models basing on patients' survival information have been adopted to estimate the clinical outcome of cancer patients. However, most existing studies treat the diagnosis and prognosis tasks separately. In fact, the diagnosis information (e.g., TNM Stages) indicates the extent of the disease severity that is highly correlated with the patients' survival. While the diagnosis is largely made based on histopathological images, recent studies have also demonstrated that integrative analysis of histopathological images and genomic data can hold great promise for improving the diagnosis and prognosis of cancers. However, direct combination of these two types of data may bring redundant features that will negatively affect the prediction performance. Therefore, it is necessary to select informative features from the derived multi-modal data. Based on the above considerations, we propose a multi-task multi-modal feature selection method for joint diagnosis and prognosis of cancers. Specifically, we make use of the task relationship learning framework to automatically discover the relationships between the diagnosis and prognosis tasks, through which we can identify important image and genomics features for both tasks. In addition, we add a regularization term to ensure that the correlation within the multi-modal data can be captured. We evaluate our method on three cancer datasets from The Cancer Genome Atlas project, and the experimental results verify that our method can achieve better performance on both diagnosis and prognosis tasks than the related methods.

    View details for DOI 10.1016/j.media.2020.101795

    View details for PubMedID 32745975

  • Deep-Learning-Based Characterization of Tumor-Infiltrating Lymphocytes in Breast Cancers From Histopathology Images and Multiomics Data. JCO clinical cancer informatics Lu, Z., Xu, S., Shao, W., Wu, Y., Zhang, J., Han, Z., Feng, Q., Huang, K. 2020; 4: 480-490

    Abstract

    Tumor-infiltrating lymphocytes (TILs) and their spatial characterizations on whole-slide images (WSIs) of histopathology sections have become crucial in diagnosis, prognosis, and treatment response prediction for different cancers. However, fully automatic assessment of TILs on WSIs currently remains a great challenge because of the heterogeneity and large size of WSIs. We present an automatic pipeline based on a cascade-training U-net to generate high-resolution TIL maps on WSIs.We present global cell-level TIL maps and 43 quantitative TIL spatial image features for 1,000 WSIs of The Cancer Genome Atlas patients with breast cancer. For more specific analysis, all the patients were divided into three subtypes, namely, estrogen receptor (ER)-positive, ER-negative, and triple-negative groups. The associations between TIL scores and gene expression and somatic mutation were examined separately in three breast cancer subtypes. Both univariate and multivariate survival analyses were performed on 43 TIL image features to examine the prognostic value of TIL spatial patterns in different breast cancer subtypes.The TIL score was in strong association with immune response pathway and genes (eg, programmed death-1 and CLTA4). Different breast cancer subtypes showed TIL score in association with mutations from different genes suggesting that different genetic alterations may lead to similar phenotypes. Spatial TIL features that represent density and distribution of TIL clusters were important indicators of the patient outcomes.Our pipeline can facilitate computational pathology-based discovery in cancer immunology and research on immunotherapy. Our analysis results are available for the research community to generate new hypotheses and insights on breast cancer immunology and development.

    View details for DOI 10.1200/CCI.19.00126

    View details for PubMedID 32453636

    View details for PubMedCentralID PMC7265782

  • Computational analysis of pathological images enables a better diagnosis of TFE3 Xp11.2 translocation renal cell carcinoma. Nature communications Cheng, J., Han, Z., Mehra, R., Shao, W., Cheng, M., Feng, Q., Ni, D., Huang, K., Cheng, L., Zhang, J. 2020; 11 (1): 1778

    Abstract

    TFE3 Xp11.2 translocation renal cell carcinoma (TFE3-RCC) generally progresses more aggressively compared with other RCC subtypes, but it is challenging to diagnose TFE3-RCC by traditional visual inspection of pathological images. In this study, we collect hematoxylin and eosin- stained histopathology whole-slide images of 74 TFE3-RCC cases (the largest cohort to date) and 74 clear cell RCC cases (ccRCC, the most common RCC subtype) with matched gender and tumor grade. An automatic computational pipeline is implemented to extract image features. Comparative study identifies 52 image features with significant differences between TFE3-RCC and ccRCC. Machine learning models are built to distinguish TFE3-RCC from ccRCC. Tests of the classification models on an external validation set reveal high accuracy with areas under ROC curve ranging from 0.842 to 0.894. Our results suggest that automatically derived image features can capture subtle morphological differences between TFE3-RCC and ccRCC and contribute to a potential guideline for TFE3-RCC diagnosis.

    View details for DOI 10.1038/s41467-020-15671-5

    View details for PubMedID 32286325

    View details for PubMedCentralID PMC7156652

  • Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations. BMC medical genomics Huang, Z., Johnson, T. S., Han, Z., Helm, B., Cao, S., Zhang, C., Salama, P., Rizkalla, M., Yu, C. Y., Cheng, J., Xiang, S., Zhan, X., Zhang, J., Huang, K. 2020; 13 (Suppl 5): 41

    Abstract

    Recent advances in kernel-based Deep Learning models have introduced a new era in medical research. Originally designed for pattern recognition and image processing, Deep Learning models are now applied to survival prognosis of cancer patients. Specifically, Deep Learning versions of the Cox proportional hazards models are trained with transcriptomic data to predict survival outcomes in cancer patients.In this study, a broad analysis was performed on TCGA cancers using a variety of Deep Learning-based models, including Cox-nnet, DeepSurv, and a method proposed by our group named AECOX (AutoEncoder with Cox regression network). Concordance index and p-value of the log-rank test are used to evaluate the model performances.All models show competitive results across 12 cancer types. The last hidden layers of the Deep Learning approaches are lower dimensional representations of the input data that can be used for feature reduction and visualization. Furthermore, the prognosis performances reveal a negative correlation between model accuracy, overall survival time statistics, and tumor mutation burden (TMB), suggesting an association among overall survival time, TMB, and prognosis prediction accuracy.Deep Learning based algorithms demonstrate superior performances than traditional machine learning based models. The cancer prognosis results measured in concordance index are indistinguishable across models while are highly variable across cancers. These findings shedding some light into the relationships between patient characteristics and survival learnability on a pan-cancer level.

    View details for DOI 10.1186/s12920-020-0686-1

    View details for PubMedID 32241264

    View details for PubMedCentralID PMC7118823

  • Integrative Analysis of Pathological Images and Multi-Dimensional Genomic Data for Early-Stage Cancer Prognosis. IEEE transactions on medical imaging Shao, W., Han, Z., Cheng, J., Cheng, L., Wang, T., Sun, L., Lu, Z., Zhang, J., Zhang, D., Huang, K. 2020; 39 (1): 99-110

    Abstract

    The integrative analysis of histopathological images and genomic data has received increasing attention for studying the complex mechanisms of driving cancers. However, most image-genomic studies have been restricted to combining histopathological images with the single modality of genomic data (e.g., mRNA transcription or genetic mutation), and thus neglect the fact that the molecular architecture of cancer is manifested at multiple levels, including genetic, epigenetic, transcriptional, and post-transcriptional events. To address this issue, we propose a novel ordinal multi-modal feature selection (OMMFS) framework that can simultaneously identify important features from both pathological images and multi-modal genomic data (i.e., mRNA transcription, copy number variation, and DNA methylation data) for the prognosis of cancer patients. Our model is based on a generalized sparse canonical correlation analysis framework, by which we also take advantage of the ordinal survival information among different patients for survival outcome prediction. We evaluate our method on three early-stage cancer datasets derived from The Cancer Genome Atlas (TCGA) project, and the experimental results demonstrated that both the selected image and multi-modal genomic markers are strongly correlated with survival enabling effective stratification of patients with distinct survival than the comparing methods, which is often difficult for early-stage cancer patients.

    View details for DOI 10.1109/TMI.2019.2920608

    View details for PubMedID 31170067

  • Correlation Analysis of Histopathology and Proteogenomics Data for Breast Cancer MOLECULAR & CELLULAR PROTEOMICS Zhan, X., Cheng, J., Huang, Z., Han, Z., Helm, B., Liu, X., Zhang, J., Wang, T., Ni, D., Huang, K. 2019; 18 (8): S37-S51
  • SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer. Frontiers in genetics Huang, Z., Zhan, X., Xiang, S., Johnson, T. S., Helm, B., Yu, C. Y., Zhang, J., Salama, P., Rizkalla, M., Han, Z., Huang, K. 2019; 10: 166

    Abstract

    Improved cancer prognosis is a central goal for precision health medicine. Though many models can predict differential survival from data, there is a strong need for sophisticated algorithms that can aggregate and filter relevant predictors from increasingly complex data inputs. In turn, these models should provide deeper insight into which types of data are most relevant to improve prognosis. Deep Learning-based neural networks offer a potential solution for both problems because they are highly flexible and account for data complexity in a non-linear fashion. In this study, we implement Deep Learning-based networks to determine how gene expression data predicts Cox regression survival in breast cancer. We accomplish this through an algorithm called SALMON (Survival Analysis Learning with Multi-Omics Neural Networks), which aggregates and simplifies gene expression data and cancer biomarkers to enable prognosis prediction. The results revealed improved performance when more omics data were used in model construction. Rather than use raw gene expression values as model inputs, we innovatively use eigengene modules from the result of gene co-expression network analysis. The corresponding high impact co-expression modules and other omics data are identified by feature selection technique, then examined by conducting enrichment analysis and exploiting biological functions, escalated the interpretation of input feature from gene level to co-expression modules level. Our study shows the feasibility of discovering breast cancer related co-expression modules, sketch a blueprint of future endeavors on Deep Learning-based survival analysis. SALMON source code is available at https://github.com/huangzhii/SALMON/.

    View details for DOI 10.3389/fgene.2019.00166

    View details for PubMedID 30906311

    View details for PubMedCentralID PMC6419526

  • Integrative analysis based on survival associated co-expression gene modules for predicting Neuroblastoma patients' survival time BIOLOGY DIRECT Han, Y., Ye, X., Cheng, J., Zhang, S., Feng, W., Han, Z., Zhang, J., Huang, K. 2019; 14: 4

    Abstract

    More than 90% of neuroblastoma patients are cured in the low-risk group while only less than 50% for those with high-risk disease can be cured. Since the high-risk patients still have poor outcomes, we need more accurate stratification to establish an individualized precise treatment plan for the patients to improve the long-term survival rate.We focus on extracting features and providing a workflow to improve survival prediction for neuroblastoma patients. With a workflow for gene co-expression network (GCN) mining in microarray and RNA-Seq datasets, we extracted molecular features from each co-expressed module and summarized them into eigengenes. Then we adopted the lasso-regularized Cox proportional hazards model to select the most informative eigengene features regarding association to the risk of metastasis. Nine eigengenes were selected which show strong association with patient survival prognosis. All of the nine corresponding gene modules also have highly enriched biological functions or cytoband locations. Three of them are unique modules to RNA-Seq data, which complement the modules from microarray data in terms of survival prognosis. We then merged all eigengenes from these unique modules and used an integrative method called Similarity Network Fusion to test the prognostic power of these eigengenes for prognosis. The prognostic accuracies are significantly improved as compared to using all eigengenes, and a subgroup of patients with very poor survival rate was identified.We first compared GCNs mined from microarray and RNA-seq data. We discovered that each data modality yields unique GCNs, which are enriched with clear biological functions. Then we do module unique analysis and use lasso-cox model to select survival-associated eigengenes. Integration of unique and survival-associated eigengenes from both data types provides complementary information that leads to more accurate survival prognosis.Reviewed by Susmita Datta, Marco Chierici and Dimitar Vassilev.

    View details for DOI 10.1186/s13062-018-0229-2

    View details for Web of Science ID 000458913400001

    View details for PubMedID 30760313

    View details for PubMedCentralID PMC6375203

  • Balanced Activity between Kv3 and Nav Channels Determines Fast-Spiking in Mammalian Central Neurons. iScience Gu, Y., Servello, D., Han, Z., Lalchandani, R. R., Ding, J. B., Huang, K., Gu, C. 2018; 9: 120–37

    Abstract

    Fast-spiking (FS) neurons can fire action potentials (APs) up to 1,000Hz and play key roles in vital functions such as sound location, motor coordination, and cognition. Here we report that the concerted actions of Kv3 voltage-gated K+ (Kv) and Na+ (Nav) channels are sufficient and necessary for inducing and maintaining FS. Voltage-clamp analysis revealed a robust correlation between the Kv3/Nav current ratio and FS. Expressing Kv3 channels alone could convert 30%-60% slow-spiking (SS) neurons to FS in culture. In contrast, co-expression of either Nav1.2 or Nav1.6 together with Kv3.1 or Kv3.3, but not alone or with Kv1.2, converted SS to FS with 100% efficiency. Furthermore, RNA-sequencing-based genome-wide analysis revealed that the Kv3/Nav ratio and Kv3 expression levels strongly correlated with the maximal AP frequencies. Therefore, FS is established by the properly balanced activities of Kv3 and Nav channels and could be further fine-tuned by channel biophysical features and localization patterns.

    View details for PubMedID 30390433

  • Ordinal Multi-modal Feature Selection for Survival Analysis of Early-Stage Renal Cancer Shao, W., Cheng, J., Sun, L., Han, Z., Feng, Q., Zhang, D., Huang, K., Frangi, A. F., Schnabel, J. A., Davatzikos, C., AlberolaLopez, C., Fichtinger, G. SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 648-656
  • Integrative Analysis of Histopathological Images and Genomic Data Predicts Clear Cell Renal Cell Carcinoma Prognosis. Cancer research Cheng, J., Zhang, J., Han, Y., Wang, X., Ye, X., Meng, Y., Parwani, A., Han, Z., Feng, Q., Huang, K. 2017; 77 (21): e91-e100

    Abstract

    In cancer, both histopathologic images and genomic signatures are used for diagnosis, prognosis, and subtyping. However, combining histopathologic images with genomic data for predicting prognosis, as well as the relationships between them, has rarely been explored. In this study, we present an integrative genomics framework for constructing a prognostic model for clear cell renal cell carcinoma. We used patient data from The Cancer Genome Atlas (n = 410), extracting hundreds of cellular morphologic features from digitized whole-slide images and eigengenes from functional genomics data to predict patient outcome. The risk index generated by our model correlated strongly with survival, outperforming predictions based on considering morphologic features or eigengenes separately. The predicted risk index also effectively stratified patients in early-stage (stage I and stage II) tumors, whereas no significant survival difference was observed using staging alone. The prognostic value of our model was independent of other known clinical and molecular prognostic factors for patients with clear cell renal cell carcinoma. Overall, this workflow and the shared software code provide building blocks for applying similar approaches in other cancers. Cancer Res; 77(21); e91-100. ©2017 AACR.

    View details for DOI 10.1158/0008-5472.CAN-17-0313

    View details for PubMedID 29092949

    View details for PubMedCentralID PMC7262576

  • Ontogenetic establishment of order-specific nuclear organization in the mammalian thalamus. Nature neuroscience Shi, W., Xianyu, A., Han, Z., Tang, X., Li, Z., Zhong, H., Mao, T., Huang, K., Shi, S. H. 2017; 20 (4): 516-528

    Abstract

    The thalamus connects the cortex with other brain regions and supports sensory perception, movement, and cognitive function via numerous distinct nuclei. However, the mechanisms underlying the development and organization of diverse thalamic nuclei remain largely unknown. Here we report an intricate ontogenetic logic of mouse thalamic structures. Individual radial glial progenitors in the developing thalamus actively divide and produce a cohort of neuronal progeny that shows striking spatial configuration and nuclear occupation related to functionality. Whereas the anterior clonal cluster displays relatively more tangential dispersion and contributes predominantly to nuclei with cognitive functions, the medial ventral posterior clonal cluster forms prominent radial arrays and contributes mostly to nuclei with sensory- or motor-related activities. Moreover, the first-order and higher-order sensory and motor nuclei across different modalities are largely segregated clonally. Notably, sonic hedgehog signaling activity influences clonal spatial distribution. Our study reveals lineage relationship to be a critical regulator of nonlaminated thalamus development and organization.

    View details for DOI 10.1038/nn.4519

    View details for PubMedID 28250409

    View details for PubMedCentralID PMC5374008

  • Clonally Related GABAergic Interneurons Do Not Randomly Disperse but Frequently Form Local Clusters in the Forebrain. Neuron Sultan, K. T., Han, Z., Zhang, X. J., Xianyu, A., Li, Z., Huang, K., Shi, S. H. 2016; 92 (1): 31-44

    Abstract

    Progenitor cells in the medial ganglionic eminence (MGE) and preoptic area (PoA) give rise to GABAergic inhibitory interneurons that are distributed in the forebrain, largely in the cortex, hippocampus, and striatum. Two previous studies suggest that clonally related interneurons originating from individual MGE/PoA progenitors frequently form local clusters in the cortex. However, Mayer et al. and Harwell et al. recently argued that MGE/PoA-derived interneuron clones disperse widely and populate different forebrain structures. Here, we report further analysis of the spatial distribution of clonally related interneurons and demonstrate that interneuron clones do not non-specifically disperse in the forebrain. Around 70% of clones are restricted to one brain structure, predominantly the cortex. Moreover, the regional distribution of clonally related interneurons exhibits a clear clustering feature, which cannot occur by chance from a random diffusion. These results confirm that lineage relationship influences the spatial distribution of inhibitory interneurons in the forebrain. This Matters Arising paper is in response to Harwell et al. (2015) and Mayer et al. (2015), published in Neuron. See also the response by Turrero García et al. (2016) and Mayer et al. (2016), published in this issue.

    View details for DOI 10.1016/j.neuron.2016.09.033

    View details for PubMedID 27710787

    View details for PubMedCentralID PMC5066572

  • Deterministic Progenitor Behavior and Unitary Production of Neurons in the Neocortex CELL Gao, P., Postiglione, M. P., Krieger, T. G., Hernandez, L., Wang, C., Han, Z., Streicher, C., Papusheva, E., Insolera, R., Chugh, K., Kodish, O., Huang, K., Simons, B. D., Luo, L., Hippenmeyer, S., Shi, S. 2014; 159 (4): 775-788

    Abstract

    Radial glial progenitors (RGPs) are responsible for producing nearly all neocortical neurons. To gain insight into the patterns of RGP division and neuron production, we quantitatively analyzed excitatory neuron genesis in the mouse neocortex using Mosaic Analysis with Double Markers, which provides single-cell resolution of progenitor division patterns and potential in vivo. We found that RGPs progress through a coherent program in which their proliferative potential diminishes in a predictable manner. Upon entry into the neurogenic phase, individual RGPs produce ?8-9 neurons distributed in both deep and superficial layers, indicating a unitary output in neuronal production. Removal of OTX1, a transcription factor transiently expressed in RGPs, results in both deep- and superficial-layer neuron loss and a reduction in neuronal unit size. Moreover, ?1/6 of neurogenic RGPs proceed to produce glia. These results suggest that progenitor behavior and histogenesis in the mammalian neocortex conform to a remarkably orderly and deterministic program.

    View details for DOI 10.1016/j.cell.2014.10.027

    View details for Web of Science ID 000344522000011

    View details for PubMedCentralID PMC4225456

  • Distinct lineage-dependent structural and functional organization of the hippocampus. Cell Xu, H. T., Han, Z., Gao, P., He, S., Li, Z., Shi, W., Kodish, O., Shao, W., Brown, K. N., Huang, K., Shi, S. H. 2014; 157 (7): 1552-64

    Abstract

    The hippocampus, as part of the cerebral cortex, is essential for memory formation and spatial navigation. Although it has been extensively studied, especially as a model system for neurophysiology, the cellular processes involved in constructing and organizing the hippocampus remain largely unclear. Here, we show that clonally related excitatory neurons in the developing hippocampus are progressively organized into discrete horizontal, but not vertical, clusters in the stratum pyramidale, as revealed by both cell-type-specific retroviral labeling and mosaic analysis with double markers (MADM). Moreover, distinct from those in the neocortex, sister excitatory neurons in the cornu ammonis 1 region of the hippocampus rarely develop electrical or chemical synapses with each other. Instead, they preferentially receive common synaptic input from nearby fast-spiking (FS), but not non-FS, interneurons and exhibit synchronous synaptic activity. These results suggest that shared inhibitory input may specify horizontally clustered sister excitatory neurons as functional units in the hippocampus.

    View details for DOI 10.1016/j.cell.2014.03.067

    View details for PubMedID 24949968

    View details for PubMedCentralID PMC4120073

  • A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data 1st Annual Meeting of the Great Lakes Bioinformatics (GLBIO) Han, Z., Tian, L., Pecot, T., Huang, T., Machiraju, R., Huang, K. BIOMED CENTRAL LTD. 2012

    Abstract

    RNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not adequate for identifying such long regions.Here we propose an enriched region detection method for ChIP-seq data to identify long enriched regions by combining a signal denoising algorithm with a false discovery rate (FDR) approach. The binned ChIP-seq data for PolII are first processed using a non-local means (NL-means) algorithm for purposes of denoising. Then, a FDR approach is developed to determine the threshold for marking enriched regions in the binned histogram.We first test our method using a public PolII ChIP-seq dataset and compare our results with published results obtained using the published algorithm HPeak. Our results show a high consistency with the published results (80-100%). Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7. The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets. Specifically, pertaining to MCF7 control samples we identified 5,911 segments with length of at least 4 Kbp (maximum 233,000 bp); and in MCF7 treated with E2 samples, we identified 6,200 such segments (maximum 325,000 bp).We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics. Our method complements existing peak detection algorithms for ChIP-seq experiments.

    View details for DOI 10.1186/1471-2105-13-S2-S2

    View details for Web of Science ID 000303936000003

    View details for PubMedID 22536865

    View details for PubMedCentralID PMC3375632

  • Clonal production and organization of inhibitory interneurons in the neocortex. Science (New York, N.Y.) Brown, K. N., Chen, S., Han, Z., Lu, C. H., Tan, X., Zhang, X. J., Ding, L., Lopez-Cruz, A., Saur, D., Anderson, S. A., Huang, K., Shi, S. H. 2011; 334 (6055): 480-6

    Abstract

    The neocortex contains excitatory neurons and inhibitory interneurons. Clones of neocortical excitatory neurons originating from the same progenitor cell are spatially organized and contribute to the formation of functional microcircuits. In contrast, relatively little is known about the production and organization of neocortical inhibitory interneurons. We found that neocortical inhibitory interneurons were produced as spatially organized clonal units in the developing ventral telencephalon. Furthermore, clonally related interneurons did not randomly disperse but formed spatially isolated clusters in the neocortex. Individual clonal clusters consisting of interneurons expressing the same or distinct neurochemical markers exhibited clear vertical or horizontal organization. These results suggest that the lineage relationship plays a pivotal role in the organization of inhibitory interneurons in the neocortex.

    View details for DOI 10.1126/science.1208884

    View details for PubMedID 22034427

    View details for PubMedCentralID PMC3304494