Bio


Tong completed her Ph.D. at the University of Rochester. She also holds an M.S. in Biostatistics from Northwestern University and a B.S. in Medical Imaging from Sichuan University.
In her research, Tong has explored topics such as subcortical and cortical neural responses to naturalistic speech and music, neural mechanisms underlying musical perception, and the impact of visual cues on speech-in-noise comprehension.
Currently, Tong is involved in the Speaker-Listener projects, where she investigates brain activities related to natural communication. She is excited to deepen her understanding of auditory processing of speech during communication and its implications for improving quality of life, particularly in clinical populations such as individuals with ASD, AD, etc.
Outside of her research, Tong is a music producer, creating original songs and soundtracks for video games. She has a passion for exploring the intersection of art and technology.

Professional Education


  • Doctor of Philosophy, University of Rochester (2024)
  • Bachelor of Science, Sichuan University (2016)
  • Master of Science, Northwestern University (2024)
  • PhD, University of Rochester, Biomedical Engineering (2024)
  • MSc, Northwestern University, Biostatistics (2018)
  • BSc, Sichuan University, Medical Technology (Medical Imaging) (2016)

Stanford Advisors


All Publications


  • Subcortical responses to music and speech are alike while cortical responses diverge. Scientific reports Shan, T., Cappelloni, M. S., Maddox, R. K. 2024; 14 (1): 789

    Abstract

    Music and speech are encountered daily and are unique to human beings. Both are transformed by the auditory pathway from an initial acoustical encoding to higher level cognition. Studies of cortex have revealed distinct brain responses to music and speech, but differences may emerge in the cortex or may be inherited from different subcortical encoding. In the first part of this study, we derived the human auditory brainstem response (ABR), a measure of subcortical encoding, to recorded music and speech using two analysis methods. The first method, described previously and acoustically based, yielded very different ABRs between the two sound classes. The second method, however, developed here and based on a physiological model of the auditory periphery, gave highly correlated responses to music and speech. We determined the superiority of the second method through several metrics, suggesting there is no appreciable impact of stimulus class (i.e., music vs speech) on the way stimulus acoustics are encoded subcortically. In this study's second part, we considered the cortex. Our new analysis method resulted in cortical music and speech responses becoming more similar but with remaining differences. The subcortical and cortical results taken together suggest that there is evidence for stimulus-class dependent processing of music and speech at the cortical but not subcortical level.

    View details for DOI 10.1038/s41598-023-50438-0

    View details for PubMedID 38191488

    View details for PubMedCentralID PMC10774448

  • Speech-In-Noise Comprehension is Improved When Viewing a Deep-Neural-Network-Generated Talking Face. Trends in hearing Shan, T., Wenner, C. E., Xu, C., Duan, Z., Maddox, R. K. 2022; 26: 23312165221136934

    Abstract

    Listening in a noisy environment is challenging, but many previous studies have demonstrated that comprehension of speech can be substantially improved by looking at the talker's face. We recently developed a deep neural network (DNN) based system that generates movies of a talking face from speech audio and a single face image. In this study, we aimed to quantify the benefits that such a system can bring to speech comprehension, especially in noise. The target speech audio was masked with signal to noise ratios of -9, -6, -3, and 0 dB and was presented to subjects in three audio-visual (AV) stimulus conditions: (1) synthesized AV: audio with the synthesized talking face movie; (2) natural AV: audio with the original movie from the corpus; and (3) audio-only: audio with a static image of the talker. Subjects were asked to type the sentences they heard in each trial and keyword recognition was quantified for each condition. Overall, performance in the synthesized AV condition fell approximately halfway between the other two conditions, showing a marked improvement over the audio-only control but still falling short of the natural AV condition. Every subject showed some benefit from the synthetic AV stimulus. The results of this study support the idea that a DNN-based model that generates a talking face from speech audio can meaningfully enhance comprehension in noisy environments, and has the potential to be used as a visual hearing aid.

    View details for DOI 10.1177/23312165221136934

    View details for PubMedID 36384325

    View details for PubMedCentralID PMC9677167

  • Abnormal developmental of hippocampal subfields and amygdalar subnuclei volumes in young adults with heavy cannabis use: A three-year longitudinal study. Progress in neuro-psychopharmacology & biological psychiatry Zhang, X., Chen, Z., Becker, B., Shan, T., Chen, T., Gong, Q. 2024; 136: 111156

    Abstract

    Differences in the volumes of the hippocampus and amygdala have consistently been observed between young adults with heavy cannabis use relative to their non-using counterparts. However, it remains unclear whether the subfields of these functionally and structurally heterogenous regions exhibit similar patterns of change in young adults with long-term heavy cannabis use disorder (CUD).This study aims to investigate the effects of long-term heavy cannabis use in young adults on the subregional structures of the hippocampus and amygdala, as well as their longitudinal alterations.The study sample comprised 20 young adults with heavy cannabis use and 22 matched non-cannabis using healthy volunteers. All participants completed the Cannabis Use Disorder Identification Test (CUDIT) and underwent two T1-structural magnetic resonance imaging (MRI) scans, one at baseline and another at follow-up 3 years later. The amygdala, hippocampus, and their subregions were segmented on T1-weighted anatomical MRI scans, using a previously validated procedure.At baseline, young adults with heavy CUD exhibited significantly larger volumes in several hippocampal (bilateral presubiculum, subiculum, Cornu Ammonis (CA) regions CA1, CA2-CA3, and right CA4-Dentate Gyrus (DG)) and amygdala (bilateral paralaminar nuclei, right medial nucleus, and right lateral nucleus) subregions compared to healthy controls, but these differences were attenuated at follow-up. Longitudinal analysis revealed an accelerated volumetric decrease in these subregions in young adults with heavy CUD relative to controls. Particularly, compared to healthy controls, significant accelerated volume decreases were observed in the right hippocampal subfields of the parasubiculum, subiculum, and CA4-DG. In the amygdala, similar trends of accelerated volumetric decreases were observed in the left central nucleus, right paralaminar nucleus, right basal nucleus, and right accessory basal nucleus.The current findings suggest that long-term heavy cannabis use impacts maturational process of the amygdala and hippocampus, especially in subregions with high concentrations of cannabinoid type 1 receptors (CB1Rs) and involvement in adult neurogenesis.

    View details for DOI 10.1016/j.pnpbp.2024.111156

    View details for PubMedID 39353549

  • Long-term tract-specific white matter microstructural changes after acute stress. Brain imaging and behavior Meng, L., Shan, T., Li, K., Gong, Q. 2021; 15 (4): 1868-1875

    Abstract

    Acute stress has substantial impact on white matter microstructure of people exposed to trauma. Its long-term consequence and how the brain changes from the stress remain unclear. In this study, we address this issue via diffusion tensor imaging (DTI). Twenty-two trauma-exposed individuals who did not meet post-traumatic stress disorder (PTSD) diagnostic criteria were recruited from the most affected area of Wenchuan earthquake and scanned twice (within twenty-five days and two years after the quake, respectively). Their emotional distress was evaluated with the Self-Rating Anxiety/Depression Scales (SAS/SDS) at both scans. Automatic fiber quantification was used to examine brain microstructure alterations. Correlation analyses were also conducted to investigate relationships between brain microstructure changes and symptom improvement. A group of demographically matched healthy controls (N = 22) from another project were scanned once before the quake using the same imaging protocols as used with trauma-exposed non-PTSD (TENP) participants. Two years after the earthquake, TENP individuals exhibited significantly reduced FA in the parietal portion of left superior longitudinal fasciculus and high FA in the parietal portion of left corticospinal tract. Over the follow-up, increased FA of the left uncinate fasciculus and the left corticospinal tract with parallel reduction of SAS and SDS were observed in TENP. No significant association was found between brain microstructure changes and symptom improvement. These results indicate changes in WM microstructure integrity of TENP brains parallel with symptom improvement over time after acute stress. However, the change would be a long-term process without external intervention.

    View details for DOI 10.1007/s11682-020-00380-w

    View details for PubMedID 32918183

    View details for PubMedCentralID PMC8413208