Professional Education

  • Doctor of Philosophy, Pennsylvania State University (2022)
  • Master of Applied Statistics, Pennsylvania State University (2022)
  • Bachelor of Science, Pennsylvania State University (2018)

Stanford Advisors

All Publications

  • Activation of GPR44 decreases severity of myeloid leukemia via specific targeting of leukemia initiating stem cells. Cell reports Qian, F., Nettleford, S. K., Zhou, J., Arner, B. E., Hall, M. A., Sharma, A., Annageldiyev, C., Rossi, R. M., Tukaramrao, D. B., Sarkar, D., Hegde, S., Gandhi, U. H., Finch, E. R., Goodfield, L., Quickel, M. D., Claxton, D. F., Paulson, R. F., Prabhu, K. S. 2023; 42 (7): 112794


    Relapse of acute myeloid leukemia (AML) remains a significant concern due to persistent leukemia-initiating stem cells (LICs) that are typically not targeted by most existing therapies. Using a murine AML model, human AML cell lines, and patient samples, we show that AML LICs are sensitive to endogenous and exogenous cyclopentenone prostaglandin-J (CyPG), Δ12-PGJ2, and 15d-PGJ2, which are increased upon dietary selenium supplementation via the cyclooxygenase-hematopoietic PGD synthase pathway. CyPGs are endogenous ligands for peroxisome proliferator-activated receptor gamma and GPR44 (CRTH2; PTGDR2). Deletion of GPR44 in a mouse model of AML exacerbated the disease suggesting that GPR44 activation mediates selenium-mediated apoptosis of LICs. Transcriptomic analysis of GPR44-/- LICs indicated that GPR44 activation by CyPGs suppressed KRAS-mediated MAPK and PI3K/AKT/mTOR signaling pathways, to enhance apoptosis. Our studies show the role of GPR44, providing mechanistic underpinnings of the chemopreventive and chemotherapeutic properties of selenium and CyPGs in AML.

    View details for DOI 10.1016/j.celrep.2023.112794

    View details for PubMedID 37459233

  • Dynamic assessment of the COVID-19 vaccine acceptance leveraging social media data JOURNAL OF BIOMEDICAL INFORMATICS Li, L., Zhou, J., Ma, Z., Bensi, M. T., Hall, M. A., Baecher, G. B. 2022; 129: 104054


    Vaccination is the most effective way to provide long-lasting immunity against viral infection; thus, rapid assessment of vaccine acceptance is a pressing challenge for health authorities. Prior studies have applied survey techniques to investigate vaccine acceptance, but these may be slow and expensive. This study investigates 29 million vaccine-related tweets from August 8, 2020 to April 19, 2021 and proposes a social media-based approach that derives a vaccine acceptance index (VAI) to quantify Twitter users' opinions on COVID-19 vaccination. This index is calculated based on opinion classifications identified with the aid of natural language processing techniques and provides a quantitative metric to indicate the level of vaccine acceptance across different geographic scales in the U.S. The VAI is easily calculated from the number of positive and negative Tweets posted by a specific users and groups of users, it can be compiled for regions such a counties or states to provide geospatial information, and it can be tracked over time to assess changes in vaccine acceptance as related to trends in the media and politics. At the national level, it showed that the VAI moved from negative to positive in 2020 and maintained steady after January 2021. Through exploratory analysis of state- and county-level data, reliable assessments of VAI against subsequent vaccination rates could be made for counties with at least 30 users. The paper discusses information characteristics that enable consistent estimation of VAI. The findings support the use of social media to understand opinions and to offer a timely and cost-effective way to assess vaccine acceptance.

    View details for DOI 10.1016/j.jbi.2022.104054

    View details for Web of Science ID 000788753600001

    View details for PubMedID 35331966

    View details for PubMedCentralID PMC8935963

  • Novel EDGE encoding method enhances ability to identify genetic interactions PLOS GENETICS Hall, M. A., Wallace, J., Lucas, A. M., Bradford, Y., Verma, S. S., Mueller-Myhsok, B., Passero, K., Zhou, J., McGuigan, J., Jiang, B., Pendergrass, S. A., Zhang, Y., Peissig, P., Brilliant, M., Sleiman, P., Hakonarson, H., Harley, J. B., Kiryluk, K., Van Steen, K., Moore, J. H., Ritchie, M. D. 2021; 17 (6): e1009534


    Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)-rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.

    View details for DOI 10.1371/journal.pgen.1009534

    View details for Web of Science ID 000664356500001

    View details for PubMedID 34086673

    View details for PubMedCentralID PMC8208534

  • Phenome-wide association studies on cardiovascular health and fatty acids considering phenotype quality control practices for epidemiological data. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Passero, K., He, X., Zhou, J., Mueller-Myhsok, B., Kleber, M. E., Maerz, W., Hall, M. A. 2020; 25: 659-670


    Phenome-wide association studies (PheWAS) allow agnostic investigation of common genetic variants in relation to a variety of phenotypes but preserving the power of PheWAS requires careful phenotypic quality control (QC) procedures. While QC of genetic data is well-defined, no established QC practices exist for multi-phenotypic data. Manually imposing sample size restrictions, identifying variable types/distributions, and locating problems such as missing data or outliers is arduous in large, multivariate datasets. In this paper, we perform two PheWAS on epidemiological data and, utilizing the novel software CLARITE (CLeaning to Analysis: Reproducibility-based Interface for Traits and Exposures), showcase a transparent and replicable phenome QC pipeline which we believe is a necessity for the field. Using data from the Ludwigshafen Risk and Cardiovascular (LURIC) Health Study we ran two PheWAS, one on cardiac-related diseases and the other on polyunsaturated fatty acids levels. These phenotypes underwent a stringent quality control screen and were regressed on a genome-wide sample of single nucleotide polymorphisms (SNPs). Seven SNPs were significant in association with dihomo-γ-linolenic acid, of which five were within fatty acid desaturases FADS1 and FADS2. PheWAS is a useful tool to elucidate the genetic architecture of complex disease phenotypes within a single experimental framework. However, to reduce computational and multiple-comparisons burden, careful assessment of phenotype quality and removal of low-quality data is prudent. Herein we perform two PheWAS while applying a detailed phenotype QC process, for which we provide a replicable pipeline that is modifiable for application to other large datasets with heterogenous phenotypes. As investigation of complex traits continues beyond traditional genome wide association studies (GWAS), such QC considerations and tools such as CLARITE are crucial to the in the analysis of non-genetic big data such as clinical measurements, lifestyle habits, and polygenic traits.

    View details for PubMedID 31797636

  • Investigation of gene-gene interactions in cardiac traits and serum fatty acid levels in the LURIC Health Study PLOS ONE Zhou, J., Passero, K., Palmiero, N. E., Mueller-Myhsok, B., Kleber, M. E., Maerz, W., Hall, M. A. 2020; 15 (9): e0238304


    Epistasis analysis elucidates the effects of gene-gene interactions (G×G) between multiple loci for complex traits. However, the large computational demands and the high multiple testing burden impede their discoveries. Here, we illustrate the utilization of two methods, main effect filtering based on individual GWAS results and biological knowledge-based modeling through Biofilter software, to reduce the number of interactions tested among single nucleotide polymorphisms (SNPs) for 15 cardiac-related traits and 14 fatty acids. We performed interaction analyses using the two filtering methods, adjusting for age, sex, body mass index (BMI), waist-hip ratio, and the first three principal components from genetic data, among 2,824 samples from the Ludwigshafen Risk and Cardiovascular (LURIC) Health Study. Using Biofilter, one interaction nearly met Bonferroni significance: an interaction between rs7735781 in XRCC4 and rs10804247 in XRCC5 was identified for venous thrombosis with a Bonferroni-adjusted likelihood ratio test (LRT) p: 0.0627. A total of 57 interactions were identified from main effect filtering for the cardiac traits G×G (10) and fatty acids G×G (47) at Bonferroni-adjusted LRT p < 0.05. For cardiac traits, the top interaction involved SNPs rs1383819 in SNTG1 and rs1493939 (138kb from 5' of SAMD12) with Bonferroni-adjusted LRT p: 0.0228 which was significantly associated with history of arterial hypertension. For fatty acids, the top interaction between rs4839193 in KCND3 and rs10829717 in LOC107984002 with Bonferroni-adjusted LRT p: 2.28×10-5 was associated with 9-trans 12-trans octadecanoic acid, an omega-6 trans fatty acid. The model inflation factor for the interactions under different filtering methods was evaluated from the standard median and the linear regression approach. Here, we applied filtering approaches to identify numerous genetic interactions related to cardiac-related outcomes as potential targets for therapy. The approaches described offer ways to detect epistasis in the complex traits and to improve precision medicine capability.

    View details for DOI 10.1371/journal.pone.0238304

    View details for Web of Science ID 000571887500145

    View details for PubMedID 32915819

    View details for PubMedCentralID PMC7485803

  • Long Non-coding RNA TDRKH-AS1 Promotes Colorectal Cancer Cell Proliferation and Invasion Through the beta-Catenin Activated Wnt Signaling Pathway FRONTIERS IN ONCOLOGY Jiao, Y., Zhou, J., Jin, Y., Yang, Y., Song, M., Zhang, L., Zhou, J., Zhang, J. 2020; 10: 639


    Colorectal cancer (CRC) is a common cancer worldwide, with a lower 5-years survival rate. Recently, long non-coding RNAs (lncRNAs) have been well-studied as the oncogenes or the tumor suppressors in multiple malignancies, including CRC. However, their biological functions and potential mechanisms in human cancer remain unclear. Here, we evaluated the expression of TDRKH-AS1 in CRC tissues and identified its potential targets. We found that TDRKH-AS1 is upregulated in majority of CRC patients, which is also significantly correlated with their malignant characteristics and their dismal prognoses. The high expression of TDRKH-AS1 can promote cancer cell proliferation substantially and invasion based on in vitro experiments. We also recognized that the TDRKH-AS1 targets the β-catenin in the Wnt signaling pathway to exert its carcinogenic activity. TDRKH-AS1 could serve as a promising prognostic predictor and a potential therapeutic target for further early diagnoses and treatments via a non-invasive method.

    View details for DOI 10.3389/fonc.2020.00639

    View details for Web of Science ID 000538517900001

    View details for PubMedID 32670860

    View details for PubMedCentralID PMC7326065

  • CLARITE Facilitates the Quality Control and Analysis Process for EWAS of Metabolic-Related Traits FRONTIERS IN GENETICS Lucas, A. M., Palmiero, N. E., McGuigan, J., Passero, K., Zhou, J., Orie, D., Ritchie, M. D., Hall, M. A. 2019; 10: 1240


    While genome-wide association studies are an established method of identifying genetic variants associated with disease, environment-wide association studies (EWAS) highlight the contribution of nongenetic components to complex phenotypes. However, the lack of high-throughput quality control (QC) pipelines for EWAS data lends itself to analysis plans where the data are cleaned after a first-pass analysis, which can lead to bias, or are cleaned manually, which is arduous and susceptible to user error. We offer a novel software, CLeaning to Analysis: Reproducibility-based Interface for Traits and Exposures (CLARITE), as a tool to efficiently clean environmental data, perform regression analysis, and visualize results on a single platform through user-guided automation. It exists as both an R package and a Python package. Though CLARITE focuses on EWAS, it is intended to also improve the QC process for phenotypes and clinical lab measures for a variety of downstream analyses, including phenome-wide association studies and gene-environment interaction studies. With the goal of demonstrating the utility of CLARITE, we performed a novel EWAS in the National Health and Nutrition Examination Survey (NHANES) (N overall Discovery=9063, N overall Replication=9874) for body mass index (BMI) and over 300 environment variables post-QC, adjusting for sex, age, race, socioeconomic status, and survey year. The analysis used survey weights along with cluster and strata information in order to account for the complex survey design. Sixteen BMI results replicated at a Bonferroni corrected p < 0.05. The top replicating results were serum levels of g-tocopherol (vitamin E) (Discovery Bonferroni p: 8.67x10-12, Replication Bonferroni p: 2.70x10-9) and iron (Discovery Bonferroni p: 1.09x10-8, Replication Bonferroni p: 1.73x10-10). Results of this EWAS are important to consider for metabolic trait analysis, as BMI is tightly associated with these phenotypes. As such, exposures predictive of BMI may be useful for covariate and/or interaction assessment of metabolic-related traits. CLARITE allows improved data quality for EWAS, gene-environment interactions, and phenome-wide association studies by establishing a high-throughput quality control infrastructure. Thus, CLARITE is recommended for studying the environmental factors underlying complex disease.

    View details for DOI 10.3389/fgene.2019.01240

    View details for Web of Science ID 000504982600001

    View details for PubMedID 31921293

    View details for PubMedCentralID PMC6930237