Bio


I am a postdoctoral scholar and a research associate leading the identification of the genes responsible for cardiovascular disease in the Division of Cardiovascular Medicine, School of Medicine, Stanford University, and VA Palo Alto Health Care System. My research work has been focused on population genetics and precision public health.

Boards, Advisory Committees, Professional Organizations


  • Editorial Board: Academic Editor, PeerJ Computation Science (2024 - Present)
  • Trusted Reviewer Board, Health and Quality of Life Outcomes (2024 - Present)
  • Reviewer, Genome Biology (2025 - Present)
  • Reviewer, PLoS ONE (2025 - Present)
  • Reviewer, JCO Clinical Cancer Informatics (2024 - Present)
  • Reviewer, Journal of the American Medical Informatics Association (JAMIA) (2024 - Present)
  • Reviewer, npj Digital Medicine (2024 - Present)
  • Reviewer, Circulation: Genomic and Precision Medicine (2023 - Present)
  • Reviewer, BMC Journals (2023 - Present)
  • Reviewer, Scientific Reports (2023 - Present)
  • Reviewer, Journal of Orthopaedic Surgery and Research (2023 - Present)
  • Reviewer, European Journal of Medical Research (2023 - Present)
  • Reviewer, Journal of Cancer Research and Clinical Oncology (2023 - Present)
  • Reviewer, Frontiers in Journals (2023 - Present)
  • Reviewer, Analytical Cellular Pathology (2022 - Present)
  • Reviewer, Evidence-Based Complementary and Alternative Medicine (2022 - Present)

Professional Education


  • Doctor of Philosophy, The Pennsylvania State University, Pathobiology (Bioinformatics and Human Genetics) (2023)
  • Master of Applied Statistics, The Pennsylvania State University, Applied Statistics (2022)
  • Bachelor of Science, The Pennsylvania State University, Biochemistry and Molecular Biology (2018)
  • Bachelor of Science, The Pennsylvania State University, Immunology and Infectious Diseases (2018)

Stanford Advisors


All Publications


  • Comparative Efficacy of Various Exercise Types on Cancer-Related Fatigue for Cancer Survivors: A Systematic Review and Network Meta-Analysis of Randomized Controlled Trials. Cancer medicine Zhou, S., Chen, G., Xu, X., Zhang, C., Chen, G., Chan, Y. T., Sun, Y. X., Zhou, J., Wang, N., Feng, Y. 2025; 14 (7): e70816

    Abstract

    This study compares the effectiveness of 7 types of guideline-recommended first-line exercises for cancer-related fatigue (CRF).A comprehensive search was conducted utilizing public databases, including Medline, Embase, Web of Science, and Cochrane Library. Randomized clinical trials examining the effects of aerobic exercise, resistance exercise, stretching exercise, combined aerobic and resistance exercise, Yoga, Qigong, or Tai Chi on CRF in various cancer types were included. A Bayesian network meta-analysis was used to synthesize the data. Subgroup analyses and sensitivity analyses were used to detect the effect modifiers and to confirm the robustness, respectively.A total of 33 clinical trials were included in this analysis. Overall, both resistance (SMD, -1.72; 95% CI, -2.81 to -0.63) and Yoga (SMD, -1.27; 95% CI, -1.38 to -1.16) reduced the fatigue severity significantly better than standard care, but there was no significant decrease for other exercise types. For cancer survivors with an age over 55 years, only Yoga showed statistically significant improvement in CRF (SMD, -1.27; 95% CI, -1.38 to -1.16). For patients with an age less than 55 years, both resistance (SMD, -1.75; 95% CI, -2.91 to -0.58) and Yoga (SMD, -1.66; 95% CI, -2.81 to -0.51) reduced the fatigue severity compared to standard care.Both resistance exercise and yoga showed significant benefits in alleviating CRF compared to standard care. Yoga was particularly effective for cancer survivors over 55 years of age, while resistance exercise and yoga were comparably effective for those under 55 years.

    View details for DOI 10.1002/cam4.70816

    View details for PubMedID 40145635

    View details for PubMedCentralID PMC11948276

  • Global, regional, and national prevalence of adult overweight and obesity, 1990-2021, with forecasts to 2050: a forecasting study for the Global Burden of Disease Study 2021. Lancet (London, England) 2025; 405 (10481): 813-838

    Abstract

    Overweight and obesity is a global epidemic. Forecasting future trajectories of the epidemic is crucial for providing an evidence base for policy change. In this study, we examine the historical trends of the global, regional, and national prevalence of adult overweight and obesity from 1990 to 2021 and forecast the future trajectories to 2050.Leveraging established methodology from the Global Burden of Diseases, Injuries, and Risk Factors Study, we estimated the prevalence of overweight and obesity among individuals aged 25 years and older by age and sex for 204 countries and territories from 1990 to 2050. Retrospective and current prevalence trends were derived based on both self-reported and measured anthropometric data extracted from 1350 unique sources, which include survey microdata and reports, as well as published literature. Specific adjustment was applied to correct for self-report bias. Spatiotemporal Gaussian process regression models were used to synthesise data, leveraging both spatial and temporal correlation in epidemiological trends, to optimise the comparability of results across time and geographies. To generate forecast estimates, we used forecasts of the Socio-demographic Index and temporal correlation patterns presented as annualised rate of change to inform future trajectories. We considered a reference scenario assuming the continuation of historical trends.Rates of overweight and obesity increased at the global and regional levels, and in all nations, between 1990 and 2021. In 2021, an estimated 1·00 billion (95% uncertainty interval [UI] 0·989-1·01) adult males and 1·11 billion (1·10-1·12) adult females had overweight and obesity. China had the largest population of adults with overweight and obesity (402 million [397-407] individuals), followed by India (180 million [167-194]) and the USA (172 million [169-174]). The highest age-standardised prevalence of overweight and obesity was observed in countries in Oceania and north Africa and the Middle East, with many of these countries reporting prevalence of more than 80% in adults. Compared with 1990, the global prevalence of obesity had increased by 155·1% (149·8-160·3) in males and 104·9% (95% UI 100·9-108·8) in females. The most rapid rise in obesity prevalence was observed in the north Africa and the Middle East super-region, where age-standardised prevalence rates in males more than tripled and in females more than doubled. Assuming the continuation of historical trends, by 2050, we forecast that the total number of adults living with overweight and obesity will reach 3·80 billion (95% UI 3·39-4·04), over half of the likely global adult population at that time. While China, India, and the USA will continue to constitute a large proportion of the global population with overweight and obesity, the number in the sub-Saharan Africa super-region is forecasted to increase by 254·8% (234·4-269·5). In Nigeria specifically, the number of adults with overweight and obesity is forecasted to rise to 141 million (121-162) by 2050, making it the country with the fourth-largest population with overweight and obesity.No country to date has successfully curbed the rising rates of adult overweight and obesity. Without immediate and effective intervention, overweight and obesity will continue to increase globally. Particularly in Asia and Africa, driven by growing populations, the number of individuals with overweight and obesity is forecast to rise substantially. These regions will face a considerable increase in obesity-related disease burden. Merely acknowledging obesity as a global health issue would be negligent on the part of global health and public health practitioners; more aggressive and targeted measures are required to address this crisis, as obesity is one of the foremost avertible risks to health now and in the future and poses an unparalleled threat of premature disease and death at local, national, and global levels.Bill & Melinda Gates Foundation.

    View details for DOI 10.1016/S0140-6736(25)00355-1

    View details for PubMedID 40049186

  • Global, regional, and national prevalence of child and adolescent overweight and obesity, 1990-2021, with forecasts to 2050: a forecasting study for the Global Burden of Disease Study 2021. Lancet (London, England) 2025; 405 (10481): 785-812

    Abstract

    Despite the well documented consequences of obesity during childhood and adolescence and future risks of excess body mass on non-communicable diseases in adulthood, coordinated global action on excess body mass in early life is still insufficient. Inconsistent measurement and reporting are a barrier to specific targets, resource allocation, and interventions. In this Article we report current estimates of overweight and obesity across childhood and adolescence, progress over time, and forecasts to inform specific actions.Using established methodology from the Global Burden of Diseases, Injuries, and Risk Factors Study 2021, we modelled overweight and obesity across childhood and adolescence from 1990 to 2021, and then forecasted to 2050. Primary data for our models included 1321 unique measured and self-reported anthropometric data sources from 180 countries and territories from survey microdata, reports, and published literature. These data were used to estimate age-standardised global, regional, and national overweight prevalence and obesity prevalence (separately) for children and young adolescents (aged 5-14 years, typically in school and cared for by child health services) and older adolescents (aged 15-24 years, increasingly out of school and cared for by adult services) by sex for 204 countries and territories from 1990 to 2021. Prevalence estimates from 1990 to 2021 were generated using spatiotemporal Gaussian process regression models, which leveraged temporal and spatial correlation in epidemiological trends to ensure comparability of results across time and geography. Prevalence forecasts from 2022 to 2050 were generated using a generalised ensemble modelling approach assuming continuation of current trends. For every age-sex-location population across time (1990-2050), we estimated obesity (vs overweight) predominance using the log ratio of obesity percentage to overweight percentage.Between 1990 and 2021, the combined prevalence of overweight and obesity in children and adolescents doubled, and that of obesity alone tripled. By 2021, 93·1 million (95% uncertainty interval 89·6-96·6) individuals aged 5-14 years and 80·6 million (78·2-83·3) aged 15-24 years had obesity. At the super-region level in 2021, the prevalence of overweight and of obesity was highest in north Africa and the Middle East (eg, United Arab Emirates and Kuwait), and the greatest increase from 1990 to 2021 was seen in southeast Asia, east Asia, and Oceania (eg, Taiwan [province of China], Maldives, and China). By 2021, for females in both age groups, many countries in Australasia (eg, Australia) and in high-income North America (eg, Canada) had already transitioned to obesity predominance, as had males and females in a number of countries in north Africa and the Middle East (eg, United Arab Emirates and Qatar) and Oceania (eg, Cook Islands and American Samoa). From 2022 to 2050, global increases in overweight (not obesity) prevalence are forecasted to stabilise, yet the increase in the absolute proportion of the global population with obesity is forecasted to be greater than between 1990 and 2021, with substantial increases forecast between 2022 and 2030, which continue between 2031 and 2050. By 2050, super-region obesity prevalence is forecasted to remain highest in north Africa and the Middle East (eg, United Arab Emirates and Kuwait), and forecasted increases in obesity are still expected to be largest across southeast Asia, east Asia, and Oceania (eg, Timor-Leste and North Korea), but also in south Asia (eg, Nepal and Bangladesh). Compared with those aged 15-24 years, in most super-regions (except Latin America and the Caribbean and the high-income super-region) a greater proportion of those aged 5-14 years are forecasted to have obesity than overweight by 2050. Globally, 15·6% (12·7-17·2) of those aged 5-14 years are forecasted to have obesity by 2050 (186 million [141-221]), compared with 14·2% (11·4-15·7) of those aged 15-24 years (175 million [136-203]). We forecasted that by 2050, there will be more young males (aged 5-14 years) living with obesity (16·5% [13·3-18·3]) than overweight (12·9% [12·2-13·6]); while for females (aged 5-24 years) and older males (aged 15-24 years), overweight will remain more prevalent than obesity. At a regional level, the following populations are forecast to have transitioned to obesity (vs overweight) predominance before 2041-50: children and adolescents (males and females aged 5-24 years) in north Africa and the Middle East and Tropical Latin America; males aged 5-14 years in east Asia, central and southern sub-Saharan Africa, and central Latin America; females aged 5-14 years in Australasia; females aged 15-24 years in Australasia, high-income North America, and southern sub-Saharan Africa; and males aged 15-24 years in high-income North America.Both overweight and obesity increased substantially in every world region between 1990 and 2021, suggesting that current approaches to curbing increases in overweight and obesity have failed a generation of children and adolescents. Beyond 2021, overweight during childhood and adolescence is forecast to stabilise due to further increases in the population who have obesity. Increases in obesity are expected to continue for all populations in all world regions. Because substantial change is forecasted to occur between 2022 and 2030, immediate actions are needed to address this public health crisis.Bill & Melinda Gates Foundation and Australian National Health and Medical Research Council.

    View details for DOI 10.1016/S0140-6736(25)00397-6

    View details for PubMedID 40049185

  • CXCL12 drives natural variation in coronary artery anatomy across diverse populations. Cell Rios Coronado, P. E., Zhou, J., Fan, X., Zanetti, D., Naftaly, J. A., Prabala, P., Martínez Jaimes, A. M., Farah, E. N., Kundu, S., Deshpande, S. S., Evergreen, I., Kho, P. F., Ma, Q., Hilliard, A. T., Abramowitz, S., Pyarajan, S., Dochtermann, D., Damrauer, S. M., Chang, K. M., Levin, M. G., Winn, V. D., Paşca, A. M., Plomondon, M. E., Waldo, S. W., Tsao, P. S., Kundaje, A., Chi, N. C., Clarke, S. L., Red-Horse, K., Assimes, T. L. 2025

    Abstract

    Coronary arteries have a specific branching pattern crucial for oxygenating heart muscle. Among humans, there is natural variation in coronary anatomy with respect to perfusion of the inferior/posterior left heart, which can branch from either the right arterial tree, the left, or both-a phenotype known as coronary dominance. Using angiographic data for >60,000 US veterans of diverse ancestry, we conducted a genome-wide association study of coronary dominance, revealing moderate heritability and identifying ten significant loci. The strongest association occurred near CXCL12 in both European- and African-ancestry cohorts, with downstream analyses implicating effects on CXCL12 expression. We show that CXCL12 is expressed in human fetal hearts at the time dominance is established. Reducing Cxcl12 in mice altered coronary dominance and caused septal arteries to develop away from Cxcl12 expression domains. These findings indicate that CXCL12 patterns human coronary arteries, paving the way for "medical revascularization" through targeting developmental pathways.

    View details for DOI 10.1016/j.cell.2025.02.005

    View details for PubMedID 40049164

  • Correction: Chiu et al. Insights into Metabolic Reprogramming in Tumor Evolution and Therapy. Cancers 2024, 16, 3513. Cancers Chiu, C. F., Guerrero, J. J., Regalado, R. R., Zhou, J., Notarte, K. I., Lu, Y. W., Encarnacion, P. C., Carles, C. D., Octavo, E. M., Limbaroc, D. C., Saengboonmee, C., Huang, S. Y. 2025; 17 (4)

    Abstract

    Upon further reflection and in consultation with her academic advisors, Ms [...].

    View details for DOI 10.3390/cancers17040686

    View details for PubMedID 39976390

  • Study on the Mechanism of the Chinese Herbal Pair Banxia-Chenpi in Ameliorating Polycystic Ovary Syndrome Based on the CYP17A1 Gene. Journal of ethnopharmacology Shen, C., Li, H., Xiao, M., Jiang, X., Jin, J., Zhou, J., Xiong, B., Chen, Y., Zhao, M. 2025: 119503

    Abstract

    As a typical Traditional Chinese Medicine (TCM) couplet medicine, Arum Ternatum Thunb. (Pinellia ternata (Thunb.) Makino, known as Banxia in Chinese) and Citrus Reticulata (pericarps of Citrus reticulata Blanco, known as Chenpi in Chinese) has been widely used in clinical practice for their properties of drying dampness, resolving phlegm, relieving oppression and masses. According to the TCM theories, the imbalance in fluid metabolism could lead to the accumulation of the excess dampness and phlegm, resulting in the pathological phenotype as 'damp-phlegm syndrome'. It can further lead to polycystic ovary syndrome (PCOS) when this accumulation of the excess fluid presents in uterus, affecting women's fertility and endocrine function. Recent studies have indicated that Banxia-Chenpi herbal pair (BXCP) exhibits significant therapeutic effects on damp-phlegm syndrome, yet the precise mechanisms underlying its anti-PCOS actions remain to be fully elucidated.The objective was to investigate the signaling pathway involved in steroid biosynthesis, particularly the cytochrome P450 family 17, subfamily A, member 1 (CYP17A1), and to evaluate the effects and mechanisms of BXCP in ameliorating PCOS through both in vivo and in vitro experiments.A systematic evaluation was conducted to assess BXCP's effects on serum biochemical indicators and ovarian tissue pathology in a PCOS rat model (induced by high-fat diet + letrozole) and a DHT-induced human granulosa cells (KGN) model. Core targets were screened using absorbed components analysis, bioinformatics, metabolomics, and network analysis. RT-qPCR and Western blot techniques were employed to confirm the expression of CYP17A1 and related signaling molecule expression during BXCP's amelioration of PCOS, both in vivo and in vitro.BXCP significantly ameliorated PCOS in vivo by mitigating weight gain, regulating estrus cycles, and normalizing sex hormone levels in rats. It upregulated metabolites related to steroid biosynthesis, including cortolone and progesterone, with CYP19A1, AKR1C3, and HSD17B1 as key regulators of CYP17A1. The main BXCP components, Naringenin and Nobiletin, increased CYP17A1 and CYP19A1 protein expression while decreased AKR1C3 and HSD17B1.In conclusion, BXCP ameliorates PCOS by activating the CYP17A1-centered steroid biosynthesis pathway. These findings provide new insights into BXCP's clinical potentials in the management of patients with PCOS, highlighting the importance of TCM in modern medicine.

    View details for DOI 10.1016/j.jep.2025.119503

    View details for PubMedID 39961422

  • Nonlinear ageing gero-marker dynamics of transcriptomic profile during calcific aortic valve mouse modeling. Archives of gerontology and geriatrics Li, H., Cui, X., Shang, Z., Yang, W., Lu, A., Guo, H., Cheng, Z., Zhou, J., Wei, Y., Li, M., Chen, G., Yu, Z. 2025; 131: 105777

    Abstract

    The prevention and management of degenerative heart disease remain challenging and could potentially be significantly improved by understanding of ageing biomarker dynamics. In this study, we constructed the calcific aortic valve mouse model at different age points, measured valve function degeneration along with valve calcification, and investigated the nonlinear dynamics using sequencing data and deep learning models. In C57BL/6 N mouse model, the older mice had higher levels of peak transvalvular jet velocity in terms of valve function. Regarding valve calcification, collagen and elastic fiber calcification in the middle layer increased significantly at 48-week-old (p < 0.001), and the calcification spread to the inner endothelial cells at 72-week-old (p < 0.0001). RNA sequencing illustrated that 30 genes, including Acadsb, L2hgdh, and Cpped1, showed increased expression with age. Among them, four genes, namely Hipk2, 9430069I07Rik, Peli3, and Slc22a12, increased more than threefold in aortic tissues in 72-week-old mice compared to 6-week-old mice. Moreover, a large proportion of genes changed in a nonlinear pattern (6,325 out of 12,160, 52%). In conclusion, both linear and nonlinear gero-markers were found in the calcific aortic valve mouse modeling, which highlighted specific periods of significant wave with accelerated ageing (48-week-old in mice).

    View details for DOI 10.1016/j.archger.2025.105777

    View details for PubMedID 39922128

  • External validation of EncephalApp Stroop test to screen minimal hepatic encephalopathy patients with nonalcoholic cirrhosis. World journal of hepatology Jiang, T. T., Liu, X. L., Yu, H., Sun, Y. X., Zhou, J. Y., Yang, Z. Y., Chen, G. 2024; 16 (12): 1450-1457

    Abstract

    Neurocognitive impairment, including minimal hepatic encephalopathy (MHE) and overt hepatic encephalopathy, is one of the most common complications of all types of primary liver diseases, such as hepatitis B, biliary cholangitis, and autoimmune hepatitis. The EncephalApp Stroop test is a smartphone application-based test that is time-saving for MHE screening. However, neurocognitive impairment is different between alcoholic cirrhosis patients and nonalcoholic cirrhosis patients, so the cutoff value for MHE diagnosis might be inflated.To validate the Stroop test in nonalcoholic cirrhosis patients.This external validation was performed at the National Center for Infectious Diseases (Beijing). Liver cirrhosis patients aged between 18 and 65 years who voluntarily enrolled in the study and provided signed informed consent were included. The Psychometric Hepatic Encephalopathy Score (PHES) test was used as the standard diagnostic criterion for MHE. The EncephalApp Stroop test was then performed on the iPad, including two sessions of tests ("off" and "on") to measure patients' ability to differentiate between numbers and letters. We assessed the performance of the EncephalApp Stroop test in terms of the area under the curve (AUC), sensitivity, specificity, positive predictive value, and negative predictive value, with the PHES as the standard criterion.A total of 160 nonalcoholic cirrhosis patients were included in this validation study, including 87 (54.4%) patients without MHE and 73 (45.6%) patients with MHE. Taking the PHES as the gold standard, the EncephalApp Stroop test performed well for nonalcoholic liver cirrhosis patients in terms of "off" time [AUC: 0.85, 95% confidence interval (CI): 0.79-0.91] and "on + off" time (AUC: 0.85, 95%CI: 0.80-0.91); however, total runs of "off" session (AUC: 0.61, 95%CI: 0.52-0.69), total runs of "on" session (AUC: 0.57, 95%CI: 0.48-0.65), and "on - off" time (AUC: 0.54, 95%CI: 0.44-0.63) were comparatively low. The optimal cutoff points were "off" time > 101.93 seconds and "on + off" time > 205.86 seconds, with sensitivities of 0.84 and 0.90, specificities of 0.77 and 0.71, positive predictive values of 0.75 and 0.72, and false-positive values of 0.85 and 0.89, respectively.Our results suggest that different cutoffs should be used for the EncephalApp Stroop tool for MHE screening between alcoholic and nonalcoholic living patients, which is a critical check before generalization to screen for neurocognitive impairment among the whole population of chronic liver diseases.

    View details for DOI 10.4254/wjh.v16.i12.1450

    View details for PubMedID 39744193

    View details for PubMedCentralID PMC11686544

  • Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis. Journal of healthcare informatics research Yu, H., Fan, L., Li, L., Zhou, J., Ma, Z., Xian, L., Hua, W., He, S., Jin, M., Zhang, Y., Gandhi, A., Ma, X. 2024; 8 (4): 658-711

    Abstract

    Large language models (LLMs) have rapidly become important tools in Biomedical and Health Informatics (BHI), potentially enabling new ways to analyze data, treat patients, and conduct research. This study aims to provide a comprehensive overview of LLM applications in BHI, highlighting their transformative potential and addressing the associated ethical and practical challenges. We reviewed 1698 research articles from January 2022 to December 2023, categorizing them by research themes and diagnostic categories. Additionally, we conducted network analysis to map scholarly collaborations and research dynamics. Our findings reveal a substantial increase in the potential applications of LLMs to a variety of BHI tasks, including clinical decision support, patient interaction, and medical document analysis. Notably, LLMs are expected to be instrumental in enhancing the accuracy of diagnostic tools and patient care protocols. The network analysis highlights dense and dynamically evolving collaborations across institutions, underscoring the interdisciplinary nature of LLM research in BHI. A significant trend was the application of LLMs in managing specific disease categories, such as mental health and neurological disorders, demonstrating their potential to influence personalized medicine and public health strategies. LLMs hold promising potential to further transform biomedical research and healthcare delivery. While promising, the ethical implications and challenges of model validation call for rigorous scrutiny to optimize their benefits in clinical settings. This survey serves as a resource for stakeholders in healthcare, including researchers, clinicians, and policymakers, to understand the current state and future potential of LLMs in BHI.

    View details for DOI 10.1007/s41666-024-00171-8

    View details for PubMedID 39463859

    View details for PubMedCentralID PMC11499577

  • Insights into Metabolic Reprogramming in Tumor Evolution and Therapy. Cancers Chiu, C. F., Guerrero, J. J., Regalado, R. R., Zamora, M. J., Zhou, J., Notarte, K. I., Lu, Y. W., Encarnacion, P. C., Carles, C. D., Octavo, E. M., Limbaroc, D. C., Saengboonmee, C., Huang, S. Y. 2024; 16 (20)

    Abstract

    Background: Cancer remains a global health challenge, characterized not just by uncontrolled cell proliferation but also by the complex metabolic reprogramming that underlies its development and progression. Objectives: This review delves into the intricate relationship between cancer and its metabolic alterations, drawing an innovative comparison with the cosmological concepts of dark matter and dark energy to highlight the pivotal yet often overlooked role of metabolic reprogramming in tumor evolution. Methods: It scrutinizes the Warburg effect and other metabolic adaptations, such as shifts in lipid synthesis, amino acid turnover, and mitochondrial function, driven by mutations in key regulatory genes. Results: This review emphasizes the significance of targeting these metabolic pathways for therapeutic intervention, outlining the potential to disrupt cancer's energy supply and signaling mechanisms. It calls for an interdisciplinary research approach to fully understand and exploit the intricacies of cancer metabolism, pointing toward metabolic reprogramming as a promising frontier for developing more effective cancer treatments. Conclusion: By equating cancer's metabolic complexity with the enigmatic nature of dark matter and energy, this review underscores the critical need for innovative strategies in oncology, highlighting the importance of unveiling and targeting the "dark energy" within cancer cells to revolutionize future therapy and research.

    View details for DOI 10.3390/cancers16203513

    View details for PubMedID 39456607

    View details for PubMedCentralID PMC11506062

  • PAGER: A novel genotype encoding strategy for modeling deviations from additivity in complex trait association studies. BioData mining Freda, P. J., Ghosh, A., Bhandary, P., Matsumoto, N., Chitre, A. S., Zhou, J., Hall, M. A., Palmer, A. A., Obafemi-Ajayi, T., Moore, J. H. 2024; 17 (1): 41

    Abstract

    The additive model of inheritance assumes that heterozygotes (Aa) are exactly intermediate in respect to homozygotes (AA and aa). While this model is commonly used in single-locus genetic association studies, significant deviations from additivity are well-documented and contribute to phenotypic variance across many traits and systems. This assumption can introduce type I and type II errors by overestimating or underestimating the effects of variants that deviate from additivity. Alternative genotype encoding strategies have been explored to account for different inheritance patterns, but they often incur significant computational or methodological costs. To address these challenges, we introduce PAGER (Phenotype Adjusted Genotype Encoding and Ranking), an efficient pre-processing method that encodes each genetic variant based on normalized mean phenotypic differences between diallelic genotype classes (AA, Aa, and aa). This approach more accurately reflects each variant's true inheritance model, improving model precision while minimizing the costs associated with alternative encoding strategies.Through extensive benchmarking on SNPs simulated with both binary and continuous phenotypes, we demonstrate that PAGER accurately represents various inheritance patterns (including additive, dominant, recessive, and heterosis), achieves levels of statistical power that meet or exceed other encoding strategies, and attains computation speeds up to 55 times faster than a similar method, EDGE. We also apply PAGER to publicly available real-world data and identify a novel, relevant putative QTL associated with body mass index in rats (Rattus norvegicus) that is not detected with the additive model.Overall, we show that PAGER is an efficient genotype encoding approach that can uncover sources of missing heritability and reveal novel insights in the study of complex traits while incurring minimal costs.

    View details for DOI 10.1186/s13040-024-00393-x

    View details for PubMedID 39394173

    View details for PubMedCentralID PMC11468469

  • Per- and poly-fluoroalkyl substances (PFAS) accelerate biological aging mediated by increased C-reactive protein. Journal of hazardous materials Zhao, Z., Zhou, J., Shi, A., Wang, J., Li, H., Yin, X., Gao, J., Wu, Y., Li, J., Sun, Y. X., Yan, H., Li, Y., Chen, G. 2024; 480: 136090

    Abstract

    Unhealthy biological aging is related to higher incidence of varied age-related diseases, even higher all-cause mortality. Previous small sample size study suggested that Per- and poly-fluoroalkyl substances (PFAS) was associated with biological aging, but the evidence of exposure-response relationships, potential effect modifiers, and potential mediators were not investigated. Therefore, we conducted a cross-sectional analysis of national study including 14, 865 adults in the US from 8 survey cycles of NHANES from 2003 to 2018, to investigate the associations of PFAS compounds in body serum, including perfluorooctanoic acid (PFOA), perfluorooctane sulfonic acid (PFOS), perfluorononanoic acid (PFNA), and perfluorohexane sulfonic acid (PFHxS), with biological aging. Generalized linear models showed that higher human exposure to PFAS was associated with accelerated biological aging. Importantly, human exposure to PFOA, PFOS, PFNA, and PFHxS with detected level (above 0.10 ng/mL) was associated with an average of 3.3 year (95 %CI: 2.7, 3.9, P < 0.001), 14.9 year (95 %CI: 7.2, 22.7, P < 0.001), 10.9 years (95 %CI: 3.9, 17.7, P < 0.001), and 8.8 years (95 %CI: 4.8, 12.9, P < 0.001) of biological aging acceleration. Cubic spline models indicated exposure-response relationships where there was no safe threshold of PFAS level regarding harms to human healthy aging. The weighted sum regression model found the significant associations of PFAS compound mixture with biological aging acceleration, and PFOA was the dominant contributor among 4 PFAS compounds. Mediation analysis suggested that C-reactive protein, one of the inflammation biomarkers, might play as mediator in PFAS-induced accelerated biological aging, but not Triglyceride-glucose index. In summary, our study suggests that the effects of PFAS on biological aging acceleration should be of concern and more action plans to address their negative impact on human health should be launched.

    View details for DOI 10.1016/j.jhazmat.2024.136090

    View details for PubMedID 39405719

  • Plasma proteomics and carotid intima-media thickness in the UK biobank cohort. Frontiers in cardiovascular medicine Chen, M. L., Kho, P. F., Guarischi-Sousa, R., Zhou, J., Panyard, D. J., Azizi, Z., Gupte, T., Watson, K., Abbasi, F., Assimes, T. L. 2024; 11: 1478600

    Abstract

    Ultrasound derived carotid intima-media thickness (cIMT) is valuable for cardiovascular risk stratification. We assessed the relative importance of traditional atherosclerosis risk factors and plasma proteins in predicting cIMT measured nearly a decade later.We examined 6,136 UK Biobank participants with 1,461 proteins profiled using the proximity extension assay applied to their baseline blood draw who subsequently underwent a cIMT measurement. We implemented linear regression, stepwise Akaike Information Criterion-based, and the least absolute shrinkage and selection operator (LASSO) models to identify potential proteomic as well as non-proteomic predictors. We evaluated our model performance using the proportion variance explained (R 2).The mean time from baseline assessment to cIMT measurement was 9.2 years. Age, blood pressure, and anthropometric related variables were the strongest predictors of cIMT with fat-free mass index of the truncal region being the strongest predictor among adiposity measurements. A LASSO model incorporating variables including age, assessment center, genetic risk factors, smoking, blood pressure, trunk fat-free mass index, apolipoprotein B, and Townsend deprivation index combined with 97 proteins achieved the highest R 2 (0.308, 95% C.I. 0.274, 0.341). In contrast, models built with proteins alone or non-proteomic variables alone explained a notably lower R 2 (0.261, 0.228-0.294 and 0.260, 0.226-0.293, respectively). Chromogranin b (CHGB), Cystatin-M/E (CST6), leptin (LEP), and prolargin (PRELP) were the proteins consistently selected across all models.Plasma proteins add to the clinical and genetic risk factors in predicting a cIMT measurement. Our findings implicate blood pressure and extracellular matrix-related proteins in cIMT pathophysiology.

    View details for DOI 10.3389/fcvm.2024.1478600

    View details for PubMedID 39416432

    View details for PubMedCentralID PMC11480011

  • A plasma proteomic signature for atherosclerotic cardiovascular disease risk prediction in the UK Biobank cohort. medRxiv : the preprint server for health sciences Gupte, T. P., Azizi, Z., Kho, P. F., Zhou, J., Chen, M., Panyard, D. J., Guarischi-Sousa, R., Hilliard, A. T., Sharma, D., Watson, K., Abbasi, F., Tsao, P. S., Clarke, S. L., Assimes, T. L. 2024

    Abstract

    Background: While risk stratification for atherosclerotic cardiovascular disease (ASCVD) is essential for primary prevention, current clinical risk algorithms demonstrate variability and leave room for further improvement. The plasma proteome holds promise as a future diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict ASCVD.Method: Clinical, genetic, and high-throughput plasma proteomic data were analyzed for association with ASCVD in a cohort of 41,650 UK Biobank participants. Selected features for analysis included clinical variables such as a UK-based cardiovascular clinical risk score (QRISK3) and lipid levels, 36 polygenic risk scores (PRSs), and Olink protein expression data of 2,920 proteins. We used least absolute shrinkage and selection operator (LASSO) regression to select features and compared area under the curve (AUC) statistics between data types. Randomized LASSO regression with a stability selection algorithm identified a smaller set of more robustly associated proteins. The benefit of plasma proteins over standard clinical variables, the QRISK3 score, and PRSs was evaluated through the derivation of Delta AUC values. We also assessed the incremental gain in model performance using proteomic datasets with varying numbers of proteins. To identify potential causal proteins for ASCVD, we conducted a two-sample Mendelian randomization (MR) analysis.Result: The mean age of our cohort was 56.0 years, 60.3% were female, and 9.8% developed incident ASCVD over a median follow-up of 6.9 years. A protein-only LASSO model selected 294 proteins and returned an AUC of 0.723 (95% CI 0.708-0.737). A clinical variable and PRS-only LASSO model selected 4 clinical variables and 20 PRSs and achieved an AUC of 0.726 (95% CI 0.712-0.741). The addition of the full proteomic dataset to clinical variables and PRSs resulted in a Delta AUC of 0.010 (95% CI 0.003-0.018). Fifteen proteins selected by a stability selection algorithm offered improvement in ASCVD prediction over the QRISK3 risk score [Delta AUC: 0.013 (95% CI 0.005-0.021)]. Filtered and clustered versions of the full proteomic dataset (consisting of 600-1,500 proteins) performed comparably to the full dataset for ASCVD prediction. Using MR, we identified 11 proteins as potentially causal for ASCVD.Conclusion: A plasma proteomic signature performs well for incident ASCVD prediction but only modestly improves prediction over clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of this signature in predicting the risk of ASCVD over the standard practice of using the QRISK3 score.

    View details for DOI 10.1101/2024.09.13.24313652

    View details for PubMedID 39314942

  • Plasma proteomic signatures for type 2 diabetes mellitus and related traits in the UK Biobank cohort. medRxiv : the preprint server for health sciences Gupte, T. P., Azizi, Z., Kho, P. F., Zhou, J., Nzenkue, K., Chen, M., Panyard, D. J., Guarischi-Sousa, R., Hilliard, A. T., Sharma, D., Watson, K., Abbasi, F., Tsao, P. S., Clarke, S. L., Assimes, T. L. 2024

    Abstract

    Aims/hypothesis: The plasma proteome holds promise as a diagnostic and prognostic tool that can accurately reflect complex human traits and disease processes. We assessed the ability of plasma proteins to predict type 2 diabetes mellitus (T2DM) and related traits.Methods: Clinical, genetic, and high-throughput proteomic data from three subcohorts of UK Biobank participants were analyzed for association with dual-energy x-ray absorptiometry (DXA) derived truncal fat (in the adiposity subcohort), estimated maximum oxygen consumption (VO 2 max) (in the fitness subcohort), and incident T2DM (in the T2DM subcohort). We used least absolute shrinkage and selection operator (LASSO) regression to assess the relative ability of non-proteomic and proteomic variables to associate with each trait by comparing variance explained (R 2 ) and area under the curve (AUC) statistics between data types. Stability selection with randomized LASSO regression identified the most robustly associated proteins for each trait. The benefit of proteomic signatures (PSs) over QDiabetes, a T2DM clinical risk score, was evaluated through the derivation of delta (Delta) AUC values. We also assessed the incremental gain in model performance metrics using proteomic datasets with varying numbers of proteins. A series of two-sample Mendelian randomization (MR) analyses were conducted to identify potentially causal proteins for adiposity, fitness, and T2DM.Results: Across all three subcohorts, the mean age was 56.7 years and 54.9% were female. In the T2DM subcohort, 5.8% developed incident T2DM over a median follow-up of 7.6 years. LASSO-derived PSs increased the R 2 of truncal fat and VO 2 max over clinical and genetic factors by 0.074 and 0.057, respectively. We observed a similar improvement in T2DM prediction over the QDiabetes score [Delta AUC: 0.016 (95% CI 0.008, 0.024)] when using a robust PS derived strictly from the T2DM outcome versus a model further augmented with non-overlapping proteins associated with adiposity and fitness. A small number of proteins (29 for truncal adiposity, 18 for VO2max, and 26 for T2DM) identified by stability selection algorithms offered most of the improvement in prediction of each outcome. Filtered and clustered versions of the full proteomic dataset supplied by the UK Biobank (ranging between 600-1,500 proteins) performed comparably to the full dataset for T2DM prediction. Using MR, we identified 4 proteins as potentially causal for adiposity, 1 as potentially causal for fitness, and 4 as potentially causal for T2DM.Conclusions/Interpretation: Plasma PSs modestly improve the prediction of incident T2DM over that possible with clinical and genetic factors. Further studies are warranted to better elucidate the clinical utility of these signatures in predicting the risk of T2DM over the standard practice of using the QDiabetes score. Candidate causally associated proteins identified through MR deserve further study as potential novel therapeutic targets for T2DM.

    View details for DOI 10.1101/2024.09.13.24313501

    View details for PubMedID 39314935

  • Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis JOURNAL OF HEALTHCARE INFORMATICS RESEARCH Yu, H., Fan, L., Li, L., Zhou, J., Ma, Z., Xian, L., Hua, W., He, S., Jin, M., Zhang, Y., Gandhi, A., Ma, X. 2024
  • A novel temperature-controlled device with standardized manipulation improves chronic back pain mediated by modulating deep muscle thickness: A multicenter randomized controlled trial CLINICAL AND TRANSLATIONAL DISCOVERY Li, L., Wang, Y., Gao, Y., Liu, S., Yang, G., Lv, X., Sun, Y., Wu, Y., Li, J., Zhou, J., Chen, G. 2024; 4 (4)

    View details for DOI 10.1002/ctd2.330

    View details for Web of Science ID 001255534200001

  • The global clinical studies of long COVID. International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases Ramonfaur, D., Ayad, N., Liu, P. H., Zhou, J., Wu, Y., Li, J., Chen, G. 2024: 107105

    Abstract

    Long COVID are those who still have symptoms, signs, and conditions after the initial phase of infection of SARS-CoV-2. The incidence of long COVID varies among regions - 31% in North America, 44% in Europe, and 51% in Asia, which is challenging the healthcare system, but there is limited guideline for its treatment. With more and more nation-wide projects funded by the government such as RECOVER initiative in US and NIHR funding in UK, an increasing number of ongoing clinical trials are investigating the efficacy of diverse therapies on reversing long COVID. After searching the WHO International Clinical Trial Registry Platform, 587 clinical studies are identified as long COVID studies. Among these, 312 studies (53.2%) are testing potential therapies. Most of the long COVID trials were conducted in the United States (58 trials [18.6%]), followed by India (55 trials [17.6%]), and Spain (20 trials [6.4%]). Interventions in these clinical trials include physical exercise, rehabilitation therapy, behavioral therapy, and pharmacological therapies including herbs, paxlovid, and fluvoxamine. These trials are aiming to deal with these long COVID symptoms and signs including fatigue, decreased pulmonary function, reduce cognitive function, and others. To date, only 11 of these 312 studies have published their results that were not confirmative unfortunately. Future studies should be designed to address sleep disorders which were seldomly included in registered clinical studies. Moreover, interventions aimed at treating the underlying pathophysiology of long COVID are also necessary but currently lacking.

    View details for DOI 10.1016/j.ijid.2024.107105

    View details for PubMedID 38782355

  • Infusion reactions to adeno-associated virus (AAV)-based gene therapy: Mechanisms, diagnostics, treatment and review of the literature CLINICAL IMMUNOLOGY Catahay, J., Notarte, K., Macasaet, R., Liu, J., Velasco, J., Peligro, P., Vallo, J., Lahoti, L., Zhou, J., Henry, B. 2024; 262
  • PLASMA PROTEOMICS AND VISCERAL ADIPOSE TISSUE VOLUME: A MACHINE LEARNING ANALYSIS OF INTERACTION BETWEEN BIOMARKERS, SOCIO-BEHAVIORAL, AND FITNESS FACTORS IN UK BIOBANK Azizi, Z., Gupte, T., Kho, P., Nzenkue, K., Zhou, J., Guarischi-Sousa, R., Panyard, D., Chen, M., Abbasi, F., Clarke, S., Tsao, P., Assimes, T. L. ELSEVIER SCIENCE INC. 2024: 1699
  • Infusion reactions to adeno-associated virus (AAV)-based gene therapy: Mechanisms, diagnostics, treatment and review of the literature. Journal of medical virology Notarte, K. I., Catahay, J. A., Macasaet, R., Liu, J., Velasco, J. V., Peligro, P. J., Vallo, J., Goldrich, N., Lahoti, L., Zhou, J., Henry, B. M. 2023; 95 (12): e29305

    Abstract

    The use of adeno-associated virus (AAV) vectors in gene therapy has demonstrated great potential in treating genetic disorders. However, infusion-associated reactions (IARs) pose a significant challenge to the safety and efficacy of AAV-based gene therapy. This review provides a comprehensive summary of the current understanding of IARs to AAV therapy, including their underlying mechanisms, clinical presentation, and treatment options. Toll-like receptor activation and subsequent production of pro-inflammatory cytokines are associated with IARs, stimulating neutralizing antibodies (Nabs) and T-cell responses that interfere with gene therapy. Risk factors for IARs include high titers of pre-existing Nabs, previous exposure to AAV, and specific comorbidities. Clinical presentation ranges from mild flu-like symptoms to severe anaphylaxis and can occur during or after AAV administration. There are no established guidelines for pre- and postadministration tests for AAV therapies, and routine laboratory requests are not standardized. Treatment options include corticosteroids, plasmapheresis, and supportive medications such as antihistamines and acetaminophen, but there is no consensus on the route of administration, dosage, and duration. This review highlights the inadequacy of current treatment regimens for IARs and the need for further research to improve the safety and efficacy of AAV-based gene therapy.

    View details for DOI 10.1002/jmv.29305

    View details for PubMedID 38116715

  • CXCL12 regulates coronary artery dominance in diverse populations and links development to disease. medRxiv : the preprint server for health sciences Rios Coronado, P. E., Zanetti, D., Zhou, J., Naftaly, J. A., Prabala, P., Kho, P. F., Martínez Jaimes, A. M., Hilliard, A. T., Pyarajan, S., Dochtermann, D., Chang, K. M., Winn, V. D., Pașca, A. M., Plomondon, M. E., Waldo, S. W., Tsao, P. S., Clarke, S. L., Red-Horse, K., Assimes, T. L. 2023

    Abstract

    Mammalian cardiac muscle is supplied with blood by right and left coronary arteries that form branches covering both ventricles of the heart. Whether branches of the right or left coronary arteries wrap around to the inferior side of the left ventricle is variable in humans and termed right or left dominance. Coronary dominance is likely a heritable trait, but its genetic architecture has never been explored. Here, we present the first large-scale multi-ancestry genome-wide association study of dominance in 61,043 participants of the VA Million Veteran Program, including over 10,300 Africans and 4,400 Admixed Americans. Dominance was moderately heritable with ten loci reaching genome wide significance. The most significant mapped to the chemokine CXCL12 in both Europeans and Africans. Whole-organ imaging of human fetal hearts revealed that dominance is established during development in locations where CXCL12 is expressed. In mice, dominance involved the septal coronary artery, and its patterning was altered with Cxcl12 deficiency. Finally, we linked human dominance patterns with coronary artery disease through colocalization, genome-wide genetic correlation and Mendelian Randomization analyses. Together, our data supports CXCL12 as a primary determinant of coronary artery dominance in humans of diverse backgrounds and suggests that developmental patterning of arteries may influence one's susceptibility to ischemic heart disease.

    View details for DOI 10.1101/2023.10.27.23297507

    View details for PubMedID 37961706

    View details for PubMedCentralID PMC10635223

  • Heat-stone massage for patients with chronic musculoskeletal pain: a protocol for multicenter randomized controlled trial. Frontiers in medicine Li, L., Xi, Y., Wang, Y., Gao, Y., Lv, X., Liu, S., Yang, G., Qian, J., Yang, X., Ayad, N., Zhou, J., Sun, Y. X., Liu, J., Li, J., Chen, G. 2023; 10: 1215858

    Abstract

    Chronic musculoskeletal pain bothers the quality of life for approximately 1.71 billion people worldwide. Although pharmacological therapies play an important role in controlling chronic pain, overuse of opioids, persistent or recurrent symptoms, and pain-related disability burden still need to be addressed. Heat-stone massage is using the heated stone to stimulate muscles and ligaments followed by massage for relax, which can potentially treat the chronic musculoskeletal pain. To determine the efficacy and safety of heat-stone massage for patients with chronic musculoskeletal pain is needed.This multicenter, 2-arm, randomized, positive drug-controlled trial will include a total of 120 patients with chronic musculoskeletal pain. The intervention group will receive a 2 week heat-stone massage, 3 times per week, whereas the control group will receive the flurbiprofen plaster twice per day for 2 weeks. The primary end point is the change in Global Pain Scale from baseline to the end of the 2 week intervention. The secondary outcomes include the pain severity (Numerical Rating Scale), pain acceptance (Chronic Pain Acceptance Questionnaire), self-management (Health Education Impact Questionnaire), self-efficacy (Pain Self-Efficacy Questionnaire), anxiety and depression (Hospital Anxiety and Depression Scale), quality of life (Short Form-36). The intention-to-treat dataset will be used for analysis.The pain management remains the research topic that patients always pay close attention to. This will be the first randomized clinical trial to evaluate whether heat-stone massage, a non-pharmacological therapy, is effective in the chronic musculoskeletal pain management. The results will provide evidence for new option of daily practice.World Health Organization Chinese Clinical Trial Registry [ChiCTR2200065654; https://www.chictr.org.cn/showproj.html?proj=185403]; International Traditional Medicine Clinical Trial Registry [ITMCTR2022000104; http://itmctr.ccebtcm.org.cn/en-US/Home/ProjectView?pid=51776b6f-77b8-4811-9b5a-a0fec10f2cee].

    View details for DOI 10.3389/fmed.2023.1215858

    View details for PubMedID 37654653

    View details for PubMedCentralID PMC10466406

  • Activation of GPR44 decreases severity of myeloid leukemia via specific targeting of leukemia initiating stem cells. Cell reports Qian, F., Nettleford, S. K., Zhou, J., Arner, B. E., Hall, M. A., Sharma, A., Annageldiyev, C., Rossi, R. M., Tukaramrao, D. B., Sarkar, D., Hegde, S., Gandhi, U. H., Finch, E. R., Goodfield, L., Quickel, M. D., Claxton, D. F., Paulson, R. F., Prabhu, K. S. 2023; 42 (7): 112794

    Abstract

    Relapse of acute myeloid leukemia (AML) remains a significant concern due to persistent leukemia-initiating stem cells (LICs) that are typically not targeted by most existing therapies. Using a murine AML model, human AML cell lines, and patient samples, we show that AML LICs are sensitive to endogenous and exogenous cyclopentenone prostaglandin-J (CyPG), Δ12-PGJ2, and 15d-PGJ2, which are increased upon dietary selenium supplementation via the cyclooxygenase-hematopoietic PGD synthase pathway. CyPGs are endogenous ligands for peroxisome proliferator-activated receptor gamma and GPR44 (CRTH2; PTGDR2). Deletion of GPR44 in a mouse model of AML exacerbated the disease suggesting that GPR44 activation mediates selenium-mediated apoptosis of LICs. Transcriptomic analysis of GPR44-/- LICs indicated that GPR44 activation by CyPGs suppressed KRAS-mediated MAPK and PI3K/AKT/mTOR signaling pathways, to enhance apoptosis. Our studies show the role of GPR44, providing mechanistic underpinnings of the chemopreventive and chemotherapeutic properties of selenium and CyPGs in AML.

    View details for DOI 10.1016/j.celrep.2023.112794

    View details for PubMedID 37459233

  • Dynamic assessment of the COVID-19 vaccine acceptance leveraging social media data JOURNAL OF BIOMEDICAL INFORMATICS Li, L., Zhou, J., Ma, Z., Bensi, M. T., Hall, M. A., Baecher, G. B. 2022; 129: 104054

    Abstract

    Vaccination is the most effective way to provide long-lasting immunity against viral infection; thus, rapid assessment of vaccine acceptance is a pressing challenge for health authorities. Prior studies have applied survey techniques to investigate vaccine acceptance, but these may be slow and expensive. This study investigates 29 million vaccine-related tweets from August 8, 2020 to April 19, 2021 and proposes a social media-based approach that derives a vaccine acceptance index (VAI) to quantify Twitter users' opinions on COVID-19 vaccination. This index is calculated based on opinion classifications identified with the aid of natural language processing techniques and provides a quantitative metric to indicate the level of vaccine acceptance across different geographic scales in the U.S. The VAI is easily calculated from the number of positive and negative Tweets posted by a specific users and groups of users, it can be compiled for regions such a counties or states to provide geospatial information, and it can be tracked over time to assess changes in vaccine acceptance as related to trends in the media and politics. At the national level, it showed that the VAI moved from negative to positive in 2020 and maintained steady after January 2021. Through exploratory analysis of state- and county-level data, reliable assessments of VAI against subsequent vaccination rates could be made for counties with at least 30 users. The paper discusses information characteristics that enable consistent estimation of VAI. The findings support the use of social media to understand opinions and to offer a timely and cost-effective way to assess vaccine acceptance.

    View details for DOI 10.1016/j.jbi.2022.104054

    View details for Web of Science ID 000788753600001

    View details for PubMedID 35331966

    View details for PubMedCentralID PMC8935963

  • Novel EDGE encoding method enhances ability to identify genetic interactions PLOS GENETICS Hall, M. A., Wallace, J., Lucas, A. M., Bradford, Y., Verma, S. S., Mueller-Myhsok, B., Passero, K., Zhou, J., McGuigan, J., Jiang, B., Pendergrass, S. A., Zhang, Y., Peissig, P., Brilliant, M., Sleiman, P., Hakonarson, H., Harley, J. B., Kiryluk, K., Van Steen, K., Moore, J. H., Ritchie, M. D. 2021; 17 (6): e1009534

    Abstract

    Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)-rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.

    View details for DOI 10.1371/journal.pgen.1009534

    View details for Web of Science ID 000664356500001

    View details for PubMedID 34086673

    View details for PubMedCentralID PMC8208534

  • Phenome-wide association studies on cardiovascular health and fatty acids considering phenotype quality control practices for epidemiological data. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Passero, K., He, X., Zhou, J., Mueller-Myhsok, B., Kleber, M. E., Maerz, W., Hall, M. A. 2020; 25: 659-670

    Abstract

    Phenome-wide association studies (PheWAS) allow agnostic investigation of common genetic variants in relation to a variety of phenotypes but preserving the power of PheWAS requires careful phenotypic quality control (QC) procedures. While QC of genetic data is well-defined, no established QC practices exist for multi-phenotypic data. Manually imposing sample size restrictions, identifying variable types/distributions, and locating problems such as missing data or outliers is arduous in large, multivariate datasets. In this paper, we perform two PheWAS on epidemiological data and, utilizing the novel software CLARITE (CLeaning to Analysis: Reproducibility-based Interface for Traits and Exposures), showcase a transparent and replicable phenome QC pipeline which we believe is a necessity for the field. Using data from the Ludwigshafen Risk and Cardiovascular (LURIC) Health Study we ran two PheWAS, one on cardiac-related diseases and the other on polyunsaturated fatty acids levels. These phenotypes underwent a stringent quality control screen and were regressed on a genome-wide sample of single nucleotide polymorphisms (SNPs). Seven SNPs were significant in association with dihomo-γ-linolenic acid, of which five were within fatty acid desaturases FADS1 and FADS2. PheWAS is a useful tool to elucidate the genetic architecture of complex disease phenotypes within a single experimental framework. However, to reduce computational and multiple-comparisons burden, careful assessment of phenotype quality and removal of low-quality data is prudent. Herein we perform two PheWAS while applying a detailed phenotype QC process, for which we provide a replicable pipeline that is modifiable for application to other large datasets with heterogenous phenotypes. As investigation of complex traits continues beyond traditional genome wide association studies (GWAS), such QC considerations and tools such as CLARITE are crucial to the in the analysis of non-genetic big data such as clinical measurements, lifestyle habits, and polygenic traits.

    View details for PubMedID 31797636

  • Investigation of gene-gene interactions in cardiac traits and serum fatty acid levels in the LURIC Health Study PLOS ONE Zhou, J., Passero, K., Palmiero, N. E., Mueller-Myhsok, B., Kleber, M. E., Maerz, W., Hall, M. A. 2020; 15 (9): e0238304

    Abstract

    Epistasis analysis elucidates the effects of gene-gene interactions (G×G) between multiple loci for complex traits. However, the large computational demands and the high multiple testing burden impede their discoveries. Here, we illustrate the utilization of two methods, main effect filtering based on individual GWAS results and biological knowledge-based modeling through Biofilter software, to reduce the number of interactions tested among single nucleotide polymorphisms (SNPs) for 15 cardiac-related traits and 14 fatty acids. We performed interaction analyses using the two filtering methods, adjusting for age, sex, body mass index (BMI), waist-hip ratio, and the first three principal components from genetic data, among 2,824 samples from the Ludwigshafen Risk and Cardiovascular (LURIC) Health Study. Using Biofilter, one interaction nearly met Bonferroni significance: an interaction between rs7735781 in XRCC4 and rs10804247 in XRCC5 was identified for venous thrombosis with a Bonferroni-adjusted likelihood ratio test (LRT) p: 0.0627. A total of 57 interactions were identified from main effect filtering for the cardiac traits G×G (10) and fatty acids G×G (47) at Bonferroni-adjusted LRT p < 0.05. For cardiac traits, the top interaction involved SNPs rs1383819 in SNTG1 and rs1493939 (138kb from 5' of SAMD12) with Bonferroni-adjusted LRT p: 0.0228 which was significantly associated with history of arterial hypertension. For fatty acids, the top interaction between rs4839193 in KCND3 and rs10829717 in LOC107984002 with Bonferroni-adjusted LRT p: 2.28×10-5 was associated with 9-trans 12-trans octadecanoic acid, an omega-6 trans fatty acid. The model inflation factor for the interactions under different filtering methods was evaluated from the standard median and the linear regression approach. Here, we applied filtering approaches to identify numerous genetic interactions related to cardiac-related outcomes as potential targets for therapy. The approaches described offer ways to detect epistasis in the complex traits and to improve precision medicine capability.

    View details for DOI 10.1371/journal.pone.0238304

    View details for Web of Science ID 000571887500145

    View details for PubMedID 32915819

    View details for PubMedCentralID PMC7485803

  • Long Non-coding RNA TDRKH-AS1 Promotes Colorectal Cancer Cell Proliferation and Invasion Through the beta-Catenin Activated Wnt Signaling Pathway FRONTIERS IN ONCOLOGY Jiao, Y., Zhou, J., Jin, Y., Yang, Y., Song, M., Zhang, L., Zhou, J., Zhang, J. 2020; 10: 639

    Abstract

    Colorectal cancer (CRC) is a common cancer worldwide, with a lower 5-years survival rate. Recently, long non-coding RNAs (lncRNAs) have been well-studied as the oncogenes or the tumor suppressors in multiple malignancies, including CRC. However, their biological functions and potential mechanisms in human cancer remain unclear. Here, we evaluated the expression of TDRKH-AS1 in CRC tissues and identified its potential targets. We found that TDRKH-AS1 is upregulated in majority of CRC patients, which is also significantly correlated with their malignant characteristics and their dismal prognoses. The high expression of TDRKH-AS1 can promote cancer cell proliferation substantially and invasion based on in vitro experiments. We also recognized that the TDRKH-AS1 targets the β-catenin in the Wnt signaling pathway to exert its carcinogenic activity. TDRKH-AS1 could serve as a promising prognostic predictor and a potential therapeutic target for further early diagnoses and treatments via a non-invasive method.

    View details for DOI 10.3389/fonc.2020.00639

    View details for Web of Science ID 000538517900001

    View details for PubMedID 32670860

    View details for PubMedCentralID PMC7326065

  • CLARITE Facilitates the Quality Control and Analysis Process for EWAS of Metabolic-Related Traits FRONTIERS IN GENETICS Lucas, A. M., Palmiero, N. E., McGuigan, J., Passero, K., Zhou, J., Orie, D., Ritchie, M. D., Hall, M. A. 2019; 10: 1240

    Abstract

    While genome-wide association studies are an established method of identifying genetic variants associated with disease, environment-wide association studies (EWAS) highlight the contribution of nongenetic components to complex phenotypes. However, the lack of high-throughput quality control (QC) pipelines for EWAS data lends itself to analysis plans where the data are cleaned after a first-pass analysis, which can lead to bias, or are cleaned manually, which is arduous and susceptible to user error. We offer a novel software, CLeaning to Analysis: Reproducibility-based Interface for Traits and Exposures (CLARITE), as a tool to efficiently clean environmental data, perform regression analysis, and visualize results on a single platform through user-guided automation. It exists as both an R package and a Python package. Though CLARITE focuses on EWAS, it is intended to also improve the QC process for phenotypes and clinical lab measures for a variety of downstream analyses, including phenome-wide association studies and gene-environment interaction studies. With the goal of demonstrating the utility of CLARITE, we performed a novel EWAS in the National Health and Nutrition Examination Survey (NHANES) (N overall Discovery=9063, N overall Replication=9874) for body mass index (BMI) and over 300 environment variables post-QC, adjusting for sex, age, race, socioeconomic status, and survey year. The analysis used survey weights along with cluster and strata information in order to account for the complex survey design. Sixteen BMI results replicated at a Bonferroni corrected p < 0.05. The top replicating results were serum levels of g-tocopherol (vitamin E) (Discovery Bonferroni p: 8.67x10-12, Replication Bonferroni p: 2.70x10-9) and iron (Discovery Bonferroni p: 1.09x10-8, Replication Bonferroni p: 1.73x10-10). Results of this EWAS are important to consider for metabolic trait analysis, as BMI is tightly associated with these phenotypes. As such, exposures predictive of BMI may be useful for covariate and/or interaction assessment of metabolic-related traits. CLARITE allows improved data quality for EWAS, gene-environment interactions, and phenome-wide association studies by establishing a high-throughput quality control infrastructure. Thus, CLARITE is recommended for studying the environmental factors underlying complex disease.

    View details for DOI 10.3389/fgene.2019.01240

    View details for Web of Science ID 000504982600001

    View details for PubMedID 31921293

    View details for PubMedCentralID PMC6930237