2016, BS, Nanjing University, China
2017-2022, Ph.D. University of Georgia, Advisor: Arthur Edison and Jonathan Arnold
2022-present, postdocs, Stanford University, Advisor: Michael Snyder

Honors & Awards

  • Graduate school travel grant, UGA (2019)
  • Outstanding graduate with honor, Nanjing University (2016)
  • First-class People’s Scholarship, Nanjing University (2014 – 2015)
  • Silver Medal, iGEM (International Genetically Engineered Machine) competition (2014)

Stanford Advisors

Current Research and Scholarly Interests

I built computational methods to integrate and model biological time series, including metabolic dynamics, longitudinal multi-omics data, and micro-sampling. I reduce dimensions, built clusters, and search for causal links.

1. Knowledge extraction from time-series metabolic systems. Recent developments in omics approaches provide a comprehensive view of the biological system at one time point. However, the understanding of the dynamic response to environmental perturbation is still limited in both data collection and computational analysis. I contributed to an NMR approach to collecting time-series metabolic data. I then designed the computational method to efficiently extract chemical information from the high-dimensional heavy dataset. This provides rich information regarding dynamic metabolic processes under different environments. I uncovered biological regulation in carbon metabolism and glycogen utilization from this high-dimensional time series, through modeling and time-series analysis. I built a new efficient workflow to understand metabolic dynamics and regulation, which can be expanded to other fermentation systems and the study of metabolic disease in humans.

2.Automation in phenotyping biological systems. New experimental approaches (e.g., microscopic devices) enable the recording of thousands of samples in a short time, which greatly promotes the phenotyping of plants, fungi, and human tissues. However, image annotation and information extraction are still manual intensive. I built multiple frameworks to classify phenotypes through ResNet in PyTorch, associate with genomic information, and uncover important structures through feature importance evaluation. I also built image segmentation programs through Detectron2 to annotate different symbiosis structures of Arbuscular mycorrhiza and worm population. Automation in phenotyping greatly

All Publications

  • Computer vision models enable mixed linear modeling to predict arbuscular mycorrhizal fungal colonization using fungal morphology. Scientific reports Zhang, S., Wu, Y., Skaro, M., Cheong, J. H., Bouffier-Landrum, A., Torrres, I., Guo, Y., Stupp, L., Lincoln, B., Prestel, A., Felt, C., Spann, S., Mandal, A., Johnson, N., Arnold, J. 2024; 14 (1): 10866


    The presence of Arbuscular Mycorrhizal Fungi (AMF) in vascular land plant roots is one of the most ancient of symbioses supporting nitrogen and phosphorus exchange for photosynthetically derived carbon. Here we provide a multi-scale modeling approach to predict AMF colonization of a worldwide crop from a Recombinant Inbred Line (RIL) population derived from Sorghum bicolor and S. propinquum. The high-throughput phenotyping methods of fungal structures here rely on a Mask Region-based Convolutional Neural Network (Mask R-CNN) in computer vision for pixel-wise fungal structure segmentations and mixed linear models to explore the relations of AMF colonization, root niche, and fungal structure allocation. Models proposed capture over 95% of the variation in AMF colonization as a function of root niche and relative abundance of fungal structures in each plant. Arbuscule allocation is a significant predictor of AMF colonization among sibling plants. Arbuscules and extraradical hyphae implicated in nutrient exchange predict highest AMF colonization in the top root section. Our work demonstrates that deep learning can be used by the community for the high-throughput phenotyping of AMF in plant roots. Mixed linear modeling provides a framework for testing hypotheses about AMF colonization phenotypes as a function of root niche and fungal structure allocations.

    View details for DOI 10.1038/s41598-024-61181-5

    View details for PubMedID 38740920

    View details for PubMedCentralID 9256619

  • SAND: Automated Time-Domain Modeling of NMR Spectra Applied to Metabolite Quantification. Analytical chemistry Wu, Y., Sanati, O., Uchimiya, M., Krishnamurthy, K., Wedell, J., Hoch, J. C., Edison, A. S., Delaglio, F. 2024


    Developments in untargeted nuclear magnetic resonance (NMR) metabolomics enable the profiling of thousands of biological samples. The exploitation of this rich source of information requires a detailed quantification of spectral features. However, the development of a consistent and automatic workflow has been challenging because of extensive signal overlap. To address this challenge, we introduce the software Spectral Automated NMR Decomposition (SAND). SAND follows on from the previous success of time-domain modeling and automatically quantifies entire spectra without manual interaction. The SAND approach uses hybrid optimization with Markov chain Monte Carlo methods, employing subsampling in both time and frequency domains. In particular, SAND randomly divides the time-domain data into training and validation sets to help avoid overfitting. We demonstrate the accuracy of SAND, which provides a correlation of 0.9 with ground truth on cases including highly overlapped simulated data sets, a two-compound mixture, and a urine sample spiked with different amounts of a four-compound mixture. We further demonstrate an automated annotation using correlation networks derived from SAND decomposed peaks, and on average, 74% of peaks for each compound can be recovered in single clusters. SAND is available in NMRbox, the cloud computing environment for NMR software hosted by the Network for Advanced NMR (NAN). Since the SAND method uses time-domain subsampling (i.e., random subset of time-domain points), it has the potential to be extended to a higher dimensionality and nonuniformly sampled data.

    View details for DOI 10.1021/acs.analchem.3c03078

    View details for PubMedID 38273718

  • Characterizing the gene-environment interaction underlying natural morphological variation in Neurospora crassa conidiophores using high-throughput phenomics and transcriptomics G3-GENES GENOMES GENETICS Krach, E. K., Skaro, M., Wu, Y., Arnold, J. 2022; 12 (4)


    Neurospora crassa propagates through dissemination of conidia, which develop through specialized structures called conidiophores. Recent work has identified striking variation in conidiophore morphology, using a wild population collection from Louisiana, United States of America to classify 3 distinct phenotypes: Wild-Type, Wrap, and Bulky. Little is known about the impact of these phenotypes on sporulation or germination later in the N. crassa life cycle, or about the genetic variation that underlies them. In this study, we show that conidiophore morphology likely affects colonization capacity of wild N. crassa isolates through both sporulation distance and germination on different carbon sources. We generated and crossed homokaryotic strains belonging to each phenotypic group to more robustly fit a model for and estimate heritability of the complex trait, conidiophore architecture. Our fitted model suggests at least 3 genes and 2 epistatic interactions contribute to conidiophore phenotype, which has an estimated heritability of 0.47. To uncover genes contributing to these phenotypes, we performed RNA-sequencing on mycelia and conidiophores of strains representing each of the 3 phenotypes. Our results show that the Bulky strain had a distinct transcriptional profile from that of Wild-Type and Wrap, exhibiting differential expression patterns in clock-controlled genes (ccgs), the conidiation-specific gene con-6, and genes implicated in metabolism and communication. Combined, these results present novel ecological impacts of and differential gene expression underlying natural conidiophore morphological variation, a complex trait that has not yet been thoroughly explored.

    View details for DOI 10.1093/g3journal/jkac050

    View details for Web of Science ID 000769668200001

    View details for PubMedID 35293585

    View details for PubMedCentralID PMC8982394

  • Uncovering in vivo biochemical patterns from time-series metabolic dynamics. PloS one Wu, Y., Judge, M. T., Edison, A. S., Arnold, J. 2022; 17 (5): e0268394


    System biology relies on holistic biomolecule measurements, and untangling biochemical networks requires time-series metabolomics profiling. With current metabolomic approaches, time-series measurements can be taken for hundreds of metabolic features, which decode underlying metabolic regulation. Such a metabolomic dataset is untargeted with most features unannotated and inaccessible to statistical analysis and computational modeling. The high dimensionality of the metabolic space also causes mechanistic modeling to be rather cumbersome computationally. We implemented a faster exploratory workflow to visualize and extract chemical and biochemical dependencies. Time-series metabolic features (about 300 for each dataset) were extracted by Ridge Tracking-based Extract (RTExtract) on measurements from continuous in vivo monitoring of metabolism by NMR (CIVM-NMR) in Neurospora crassa under different conditions. The metabolic profiles were then smoothed and projected into lower dimensions, enabling a comparison of metabolic trends in the cultures. Next, we expanded incomplete metabolite annotation using a correlation network. Lastly, we uncovered meaningful metabolic clusters by estimating dependencies between smoothed metabolic profiles. We thus sidestepped the processes of time-consuming mechanistic modeling, difficult global optimization, and labor-intensive annotation. Multiple clusters guided insights into central energy metabolism and membrane synthesis. Dense connections with glucose 1-phosphate indicated its central position in metabolism in N. crassa. Our approach was benchmarked on simulated random network dynamics and provides a novel exploratory approach to analyzing high-dimensional metabolic dynamics.

    View details for DOI 10.1371/journal.pone.0268394

    View details for PubMedID 35550643

  • Wild Isolates of Neurospora crassa Reveal Three Conidiophore Architectural Phenotypes MICROORGANISMS Krach, E. K., Wu, Y., Skaro, M., Mao, L., Arnold, J. 2020; 8 (11)


    The vegetative life cycle in the model filamentous fungus, Neurospora crassa, relies on the development of conidiophores to produce new spores. Environmental, temporal, and genetic components of conidiophore development have been well characterized; however, little is known about their morphological variation. We explored conidiophore architectural variation in a natural population using a wild population collection of 21 strains from Louisiana, United States of America (USA). Our work reveals three novel architectural phenotypes, Wild Type, Bulky, and Wrap, and shows their maintenance throughout the duration of conidiophore development. Furthermore, we present a novel image-classifier using a convolutional neural network specifically developed to assign conidiophore architectural phenotypes in a high-throughput manner. To estimate an inheritance model for this discrete complex trait, crosses between strains of each phenotype were conducted, and conidiophores of subsequent progeny were characterized using the trained classifier. Our model suggests that conidiophore architecture is controlled by at least two genes and has a heritability of 0.23. Additionally, we quantified the number of conidia produced by each conidiophore type and their dispersion distance, suggesting that conidiophore architectural phenotype may impact N. crassa colonization capacity.

    View details for DOI 10.3390/microorganisms8111760

    View details for Web of Science ID 000593214700001

    View details for PubMedID 33182369

    View details for PubMedCentralID PMC7695285

  • RTExtract: time-series NMR spectra quantification based on 3D surface ridge tracking BIOINFORMATICS Wu, Y., Judge, M. T., Arnold, J., Bhandarkar, S. M., Edison, A. S. 2020; 36 (20): 5068-5075


    Time-series nuclear magnetic resonance (NMR) has advanced our knowledge about metabolic dynamics. Before analyzing compounds through modeling or statistical methods, chemical features need to be tracked and quantified. However, because of peak overlap and peak shifting, the available protocols are time consuming at best or even impossible for some regions in NMR spectra.We introduce Ridge Tracking-based Extract (RTExtract), a computer vision-based algorithm, to quantify time-series NMR spectra. The NMR spectra of multiple time points were formulated as a 3D surface. Candidate points were first filtered using local curvature and optima, then connected into ridges by a greedy algorithm. Interactive steps were implemented to refine results. Among 173 simulated ridges, 115 can be tracked (RMSD < 0.001). For reproducing previous results, RTExtract took less than 2 h instead of ∼48 h, and two instead of seven parameters need tuning. Multiple regions with overlapping and changing chemical shifts are accurately tracked.Source code is freely available within Metabolomics toolbox GitHub repository ( and is implemented in MATLAB and R.Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btaa631

    View details for Web of Science ID 000605690100013

    View details for PubMedID 32653900

    View details for PubMedCentralID PMC7755419

  • Continuous in vivo Metabolism by NMR FRONTIERS IN MOLECULAR BIOSCIENCES Judge, M. T., Wu, Y., Tayyari, F., Haffori, A., Glushka, J., Ito, T., Arnold, J., Edison, A. S. 2019; 6: 26


    Dense time-series metabolomics data are essential for unraveling the underlying dynamic properties of metabolism. Here we extend high-resolution-magic angle spinning (HR-MAS) to enable continuous in vivo monitoring of metabolism by NMR (CIVM-NMR) and provide analysis tools for these data. First, we reproduced a result in human chronic lymphoid leukemia cells by using isotope-edited CIVM-NMR to rapidly and unambiguously demonstrate unidirectional flux in branched-chain amino acid metabolism. We then collected untargeted CIVM-NMR datasets for Neurospora crassa, a classic multicellular model organism, and uncovered dynamics between central carbon metabolism, amino acid metabolism, energy storage molecules, and lipid and cell wall precursors. Virtually no sample preparation was required to yield a dynamic metabolic fingerprint over hours to days at ~4-min temporal resolution with little noise. CIVM-NMR is simple and readily adapted to different types of cells and microorganisms, offering an experimental complement to kinetic models of metabolism for diverse biological systems.

    View details for DOI 10.3389/fmolb.2019.00026

    View details for Web of Science ID 000466811700001

    View details for PubMedID 31114791

    View details for PubMedCentralID PMC6502900

  • Genome-Wide Analysis Reveals Ancestral Lack of Seventeen Different tRNAs and Clade-Specific Loss of tRNA-CNNs in Archaea FRONTIERS IN MICROBIOLOGY Wu, Y., Wu, P., Wang, B., Shao, Z. 2018; 9: 1245


    Transfer RNA (tRNA) is a category of RNAs that specifically decode messenger RNAs (mRNAs) into proteins by recognizing a set of 61 codons commonly adopted by different life domains. The composition and abundance of tRNAs play critical roles in shaping codon usage and pairing bias, which subsequently modulate mRNA translation efficiency and accuracy. Over the past few decades, effort has been concentrated on evaluating the specificity and redundancy of different tRNA families. However, the mechanism and processes underlying tRNA evolution have only rarely been investigated. In this study, by surveying tRNA genes in 167 completely sequenced genomes, we systematically investigated the composition and evolution of tRNAs in Archaea from a phylogenetic perspective. Our data revealed that archaeal genomes are compact in both tRNA types and copy number. Generally, no more than 44 different types of tRNA are present in archaeal genomes to decode the 61 canonical codons, and most of them have only one gene copy per genome. Among them, tRNA-Met was significantly overrepresented, with an average of three copies per genome. In contrast, the tRNA-UAU and 16 tRNAs with A-starting anticodons (tRNA-ANNs) were rarely detected in all archaeal genomes. The conspicuous absence of these tRNAs across the archaeal phylogeny suggests they might have not been evolved in the common ancestor of Archaea, rather than have lost independently from different clades. Furthermore, widespread absence of tRNA-CNNs in the Methanococcales and Methanobacteriales genomes indicates convergent loss of these tRNAs in the two clades. This clade-specific tRNA loss may be attributing to the reductive evolution of their genomes. Our data suggest that the current tRNA profiles in Archaea are contributed not only by the ancestral tRNA composition, but also by differential maintenance and loss of redundant tRNAs.

    View details for DOI 10.3389/fmicb.2018.01245

    View details for Web of Science ID 000434397000001

    View details for PubMedID 29930548

    View details for PubMedCentralID PMC6000648

  • Systematic analyses of glutamine and glutamate metabolisms across different cancer types CHINESE JOURNAL OF CANCER Tian, Y., Du, W., Cao, S., Wu, Y., Dong, N., Wang, Y., Xu, Y. 2017; 36: 88


    Glutamine and glutamate are known to play important roles in cancer biology. However, no detailed information is available in terms of their levels of involvement in various biological processes across different cancer types, whereas such knowledge could be critical for understanding the distinct characteristics of different cancer types. Our computational study aimed to examine the functional roles of glutamine and glutamate across different cancer types.We conducted a comparative analysis of gene expression data of cancer tissues versus normal control tissues of 11 cancer types to understand glutamine and glutamate metabolisms in cancer. Specifically, we developed a linear regression model to assess differential contributions by glutamine and/or glutamate to each of seven biological processes in cancer versus control tissues.While our computational predictions were consistent with some of the previous observations, multiple novel predictions were made: (1) glutamine is generally not involved in purine synthesis in cancer except for breast cancer, and is similarly not involved in pyridine synthesis except for kidney cancer; (2) glutamine is generally not involved in ATP production in cancer; (3) glutamine's contribution to nucleotide synthesis is minimal if any in cancer; (4) glutamine is not involved in asparagine synthesis in cancer except for bladder and lung cancers; and (5) glutamate does not contribute to serine synthesis except for bladder cancer.We comprehensively predicted the roles of glutamine and glutamate metabolisms in selected metabolic pathways in cancer tissues versus control tissues, which may lead to novel approaches to therapeutic development targeted at glutamine and/or glutamate metabolism. However, our predictions need further functional validation.

    View details for DOI 10.1186/s40880-017-0255-y

    View details for Web of Science ID 000414851000002

    View details for PubMedID 29116024

    View details for PubMedCentralID PMC5678792

  • Large-Scale Analyses of Angiosperm Nucleotide-Binding Site-Leucine-Rich Repeat Genes Reveal Three Anciently Diverged Classes with Distinct Evolutionary Patterns PLANT PHYSIOLOGY Shao, Z., Xue, J., Wu, P., Zhang, Y., Wu, Y., Hang, Y., Wang, B., Chen, J. 2016; 170 (4): 2095-2109


    Nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes make up the largest plant disease resistance gene family (R genes), with hundreds of copies occurring in individual angiosperm genomes. However, the expansion history of NBS-LRR genes during angiosperm evolution is largely unknown. By identifying more than 6,000 NBS-LRR genes in 22 representative angiosperms and reconstructing their phylogenies, we present a potential framework of NBS-LRR gene evolution in the angiosperm. Three anciently diverged NBS-LRR classes (TNLs, CNLs, and RNLs) were distinguished with unique exon-intron structures and DNA motif sequences. A total of seven ancient TNL, 14 CNL, and two RNL lineages were discovered in the ancestral angiosperm, from which all current NBS-LRR gene repertoires were evolved. A pattern of gradual expansion during the first 100 million years of evolution of the angiosperm clade was observed for CNLs. TNL numbers remained stable during this period but were eventually deleted in three divergent angiosperm lineages. We inferred that an intense expansion of both TNL and CNL genes started from the Cretaceous-Paleogene boundary. Because dramatic environmental changes and an explosion in fungal diversity occurred during this period, the observed expansions of R genes probably reflect convergent adaptive responses of various angiosperm families. An ancient whole-genome duplication event that occurred in an angiosperm ancestor resulted in two RNL lineages, which were conservatively evolved and acted as scaffold proteins for defense signal transduction. Overall, the reconstructed framework of angiosperm NBS-LRR gene evolution in this study may serve as a fundamental reference for better understanding angiosperm NBS-LRR genes.

    View details for DOI 10.1104/pp.15.01487

    View details for Web of Science ID 000375424200016

    View details for PubMedID 26839128

    View details for PubMedCentralID PMC4825152

  • Identification of Arbuscular Mycorrhiza (AM)-Responsive microRNAs in Tomato FRONTIERS IN PLANT SCIENCE Wu, P., Wu, Y., Liu, C., Liu, L., Ma, F., Wu, X., Wu, M., Hang, Y., Chen, J., Shao, Z., Wang, B. 2016; 7: 429


    A majority of land plants can form symbiosis with arbuscular mycorrhizal (AM) fungi. MicroRNAs (miRNAs) have been implicated to regulate this process in legumes, but their involvement in non-legume species is largely unknown. In this study, by performing deep sequencing of sRNA libraries in tomato roots and comparing with tomato genome, a total of 700 potential miRNAs were predicted, among them, 187 are known plant miRNAs that have been previously deposited in miRBase. Unlike the profiles in other plants such as rice and Arabidopsis, a large proportion of predicted tomato miRNAs was 24 nt in length. A similar pattern was observed in the potato genome but not in tobacco, indicating a Solanum genus-specific expansion of 24-nt miRNAs. About 40% identified tomato miRNAs showed significantly altered expressions upon Rhizophagus irregularis inoculation, suggesting the potential roles of these novel miRNAs in AM symbiosis. The differential expression of five known and six novel miRNAs were further validated using qPCR analysis. Interestingly, three up-regulated known tomato miRNAs belong to a known miR171 family, a member of which has been reported in Medicago truncatula to regulate AM symbiosis. Thus, the miR171 family likely regulates AM symbiosis conservatively across different plant lineages. More than 1000 genes targeted by potential AM-responsive miRNAs were provided and their roles in AM symbiosis are worth further exploring.

    View details for DOI 10.3389/fpls.2016.00429

    View details for Web of Science ID 000373264200004

    View details for PubMedID 27066061

    View details for PubMedCentralID PMC4814767