Professional Education

  • Bachelor of Engineering, Xiamen University (2012)
  • Master of Science, Columbia University (2014)
  • Doctor of Philosophy, University of Washington (2020)

Stanford Advisors

All Publications

  • Transcriptome diversity is a systematic source of variation in RNA-sequencing data. PLoS computational biology GarcĂ­a-Nieto, P. E., Wang, B., Fraser, H. B. 2022; 18 (3): e1009939


    RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER), which infers broad variance components in gene expression measurements, has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Here we show that transcriptome diversity-a simple metric based on Shannon entropy-explains a large portion of variability in gene expression and is the strongest known factor encoded in PEER factors. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. In sum, transcriptome diversity provides a simple explanation for a major source of variation in both gene expression estimates and PEER covariates.

    View details for DOI 10.1371/journal.pcbi.1009939

    View details for PubMedID 35324895

  • Human 5 ' UTR design and variant effect prediction from a massively parallel translation assay NATURE BIOTECHNOLOGY Sample, P. J., Wang, B., Reid, D. W., Presnyak, V., McFadyen, I. J., Morris, D. R., Seelig, G. 2019; 37 (7): 803-+


    The ability to predict the impact of cis-regulatory sequences on gene expression would facilitate discovery in fundamental and applied biology. Here we combine polysome profiling of a library of 280,000 randomized 5' untranslated regions (UTRs) with deep learning to build a predictive model that relates human 5' UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5' UTRs that accurately direct specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,212 truncated human 5' UTRs and 3,577 naturally occurring variants and show that the model predicts ribosome loading of these sequences. Finally, we provide evidence of 45 single-nucleotide variants (SNVs) associated with human diseases that substantially change ribosome loading and thus may represent a molecular basis for disease.

    View details for DOI 10.1038/s41587-019-0164-5

    View details for Web of Science ID 000478028700026

    View details for PubMedID 31267113

    View details for PubMedCentralID PMC7100133