Stanford Advisors


  • Lu Tian, Postdoctoral Faculty Sponsor

All Publications


  • imply: improving cell-type deconvolution accuracy using personalized reference profiles. Genome medicine Meng, G., Pan, Y., Tang, W., Zhang, L., Cui, Y., Schumacher, F. R., Wang, M., Wang, R., He, S., Krischer, J., Li, Q., Feng, H. 2024; 16 (1): 65

    Abstract

    Using computational tools, bulk transcriptomics can be deconvoluted to estimate the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, ignoring person-to-person heterogeneity. Here, we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. Simulation studies demonstrate reduced bias compared with existing methods. Real data analyses on longitudinal consortia show disparities in cell type proportions are associated with several disease phenotypes in Type 1 diabetes and Parkinson's disease. imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/ .

    View details for DOI 10.1186/s13073-024-01338-z

    View details for PubMedID 38685057

    View details for PubMedCentralID 5130901

  • Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. Journal of biomedical informatics Wei, Q., Yao, Z., Cui, Y., Wei, B., Jin, Z., Xu, X. 2024: 104620

    Abstract

    OBJECTIVE: Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research.METHODS: An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327.RESULTS: A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency.CONCLUSION: This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.

    View details for DOI 10.1016/j.jbi.2024.104620

    View details for PubMedID 38462064

  • Adjusted win ratio using the inverse probability of treatment weighting. Journal of biopharmaceutical statistics Wang, D., Zheng, S., Cui, Y., He, N., Chen, T., Huang, B. 2023: 1-16

    Abstract

    The win ratio method has been increasingly applied in the design and analysis of clinical trials. However, the win ratio method is a univariate approach that does not allow for adjusting for baseline imbalances in covariates, although a stratified win ratio can be calculated when the number of strata is small. This paper proposes an adjusted win ratio to control for such imbalances by inverse probability of treatment weighting (IPTW) method. We derive the adjusted win ratio with its variance and suggest three IPTW adjustments: IPTW-average treatment effect (IPTW-ATE), stabilized IPTW-ATE (SIPTW-ATE) and IPTW-average treatment effect in the treated (IPTW-ATT). The proposed adjusted methods are applied to analyse a composite outcome in the CHARM trial. The statistical properties of the methods are assessed through simulations. Results show that adjusted win ratio methods can correct the win ratio for covariate imbalances at baseline. Simulation results show that the three proposed adjusted win ratios have similar power to detect the treatment difference and have slightly lower power than the corresponding adjusted Cox models when the assumption of proportional hazards holds true but have consistently higher power than adjusted Cox models when the proportional hazard assumption is violated.

    View details for DOI 10.1080/10543406.2023.2275759

    View details for PubMedID 37947400

  • Evaluating the performance of ChatGPT in differential diagnosis of neurodevelopmental disorders: A pediatricians-machine comparison. Psychiatry research Wei, Q., Cui, Y., Wei, B., Cheng, Q., Xu, X. 2023; 327: 115351

    View details for DOI 10.1016/j.psychres.2023.115351

    View details for PubMedID 37506587

  • Evidence synthesis analysis with prioritized benefit outcomes in oncology clinical trials. Journal of biopharmaceutical statistics Cui, Y., Dong, G., Kuan, P. F., Huang, B. 2023; 33 (3): 272-288

    Abstract

    Overall survival, progression-free survival, objective response/complete response, and duration of (complete) response are frequently used as the primary and secondary efficacy endpoints for designs and analyses of oncology clinical trials. However, these endpoints are typically analyzed separately. In this article, we introduce an evidence synthesis approach to prioritize the benefit outcomes by applying the generalized pairwise comparisons (GPC) method, and use win statistics (win ratio, win odds and net benefit) to quantify treatment benefit. Under the framework of GPC, the main advantage of this evidence synthesis approach is the ability to combine relevant outcomes of various types into a single summary statistic without relying on any parametric assumptions. It is particularly relevant since health authorities and the pharmaceutical industry are increasingly incorporating structured quantitative methodologies in their benefit-risk assessment. We apply this evidence synthesis approach to an oncology phase 3 study in first-line renal cell carcinoma to assess the overall effect of an investigational treatment by ranking the most clinically relevant endpoints in cancer drug development. This application and a simulation study demonstrate that the proposed approach can synthesize the evidence of treatment effect from multiple prioritized benefit outcomes, and has substantial advantage over conventional methods that analyze each individual endpoint separately. We also introduce a newly developed R package WINS for statistical inference based on win statistics.

    View details for DOI 10.1080/10543406.2022.2141769

    View details for PubMedID 36343174

  • The stratified win statistics (win ratio, win odds, and net benefit). Pharmaceutical statistics Dong, G., Hoaglin, D. C., Huang, B., Cui, Y., Wang, D., Cheng, Y., Gamalo-Siebers, M. 2023; 22 (4): 748-756

    Abstract

    The win odds and the net benefit are related directly to each other and indirectly, through ties, to the win ratio. These three win statistics test the same null hypothesis of equal win probabilities between two groups. They provide similar p-values and powers, because the Z-values of their statistical tests are approximately equal. Thus, they can complement one another to show the strength of a treatment effect. In this article, we show that the estimated variances of the win statistics are also directly related regardless of ties or indirectly related through ties. Since its introduction in 2018, the stratified win ratio has been applied in designs and analyses of clinical trials, including Phase III and Phase IV studies. This article generalizes the stratified method to the win odds and the net benefit. As a result, the relations of the three win statistics and the approximate equivalence of their statistical tests also hold for the stratified win statistics.

    View details for DOI 10.1002/pst.2293

    View details for PubMedID 36808217

  • Win statistics (win ratio, win odds, and net benefit) can complement one another to show the strength of the treatment effect on time-to-event outcomes. Pharmaceutical statistics Dong, G., Huang, B., Verbeeck, J., Cui, Y., Song, J., Gamalo-Siebers, M., Wang, D., Hoaglin, D. C., Seifu, Y., Mütze, T., Kolassa, J. 2023; 22 (1): 20-33

    Abstract

    Conventional analyses of a composite of multiple time-to-event outcomes use the time to the first event. However, the first event may not be the most important outcome. To address this limitation, generalized pairwise comparisons and win statistics (win ratio, win odds, and net benefit) have become popular and have been applied to clinical trial practice. However, win ratio, win odds, and net benefit have typically been used separately. In this article, we examine the use of these three win statistics jointly for time-to-event outcomes. First, we explain the relation of point estimates and variances among the three win statistics, and the relation between the net benefit and the Mann-Whitney U statistic. Then we explain that the three win statistics are based on the same win proportions, and they test the same null hypothesis of equal win probabilities in two groups. We show theoretically that the Z-values of the corresponding statistical tests are approximately equal; therefore, the three win statistics provide very similar p-values and statistical powers. Finally, using simulation studies and data from a clinical trial, we demonstrate that, when there is no (or little) censoring, the three win statistics can complement one another to show the strength of the treatment effect. However, when the amount of censoring is not small, and without adjustment for censoring, the win odds and the net benefit may have an advantage for interpreting the treatment effect; with adjustment (e.g., IPCW adjustment) for censoring, the three win statistics can complement one another to show the strength of the treatment effect. For calculations we use the R package WINS, available on the CRAN (Comprehensive R Archive Network).

    View details for DOI 10.1002/pst.2251

    View details for PubMedID 35757986

  • Assessing dynamic covariate effects with survival data. Lifetime data analysis Cui, Y., Peng, L. 2022; 28 (4): 675-699

    Abstract

    Dynamic (or varying) covariate effects often manifest meaningful physiological mechanisms underlying chronic diseases. However, a static view of covariate effects is typically adopted by standard approaches to evaluating disease prognostic factors, which can result in depreciation of some important disease markers. To address this issue, in this work, we take the perspective of globally concerned quantile regression, and propose a flexible testing framework suited to assess either constant or dynamic covariate effects. We study the powerful Kolmogorov-Smirnov (K-S) and Cramér-Von Mises (C-V) type test statistics and develop a simple resampling procedure to tackle their complicated limit distributions. We provide rigorous theoretical results, including the limit null distributions and consistency under a general class of alternative hypotheses of the proposed tests, as well as the justifications for the presented resampling procedure. Extensive simulation studies and a real data example demonstrate the utility of the new testing procedures and their advantages over existing approaches in assessing dynamic covariate effects.

    View details for DOI 10.1007/s10985-022-09571-7

    View details for PubMedID 35962886

    View details for PubMedCentralID PMC9901566

  • Assessing the Reproducibility of Microbiome Measurements Based on Concordance Correlation Coefficients. Journal of the Royal Statistical Society. Series C, Applied statistics Cui, Y., Peng, L., Hu, Y., Lai, H. J. 2021; 70 (4): 1027-1048

    Abstract

    Evaluating the reproducibility or agreement of microbiome measurements is often a crucial step to ensure rigorous downstream analyses in microbiome studies. In this paper, we address this need by developing adaptations of Lin's concordance correlation coefficient (CCC) tailored to microbiome studies. We introduce a general formulation of the new CCC measures upon the use of a distance function appropriately characterizing the discrepancy between microbiome compositional measurements. We thoroughly study the special cases that adopt Euclidean distance and Aitchison distance. Our proposals appropriately account for the unique features of microbiome compositional data, including high-dimensionality, dependency among individual relative abundances, and the presence of many zeros. We further investigate a practical compound approach to help better understand the sources of data inconsistency. Extensive simulation studies are conducted to evaluate the utility of the proposed methods in realistic scenarios. We also apply the proposed methods to a microbiome validation dataset from the Feeding Infants Right.. from the STart (FIRST) study. Our analyses offer useful insight about the extent of data variations resulted from two different experiment procedures as well as their heterogeneous patterns across genera.

    View details for DOI 10.1111/rssc.12497

    View details for PubMedID 34776546

    View details for PubMedCentralID PMC8587529

  • An empirical bayesian approach for testing gene expression fold change and its application in detecting global dosage effects. NAR genomics and bioinformatics Guo, Z., Cui, Y., Shi, X., Birchler, J. A., Albizua, I., Sherman, S. L., Qin, Z. S., Ji, T. 2020; 2 (3): lqaa072

    Abstract

    We are motivated by biological studies intended to understand global gene expression fold change. Biologists have generally adopted a fixed cutoff to determine the significance of fold changes in gene expression studies (e.g. by using an observed fold change equal to two as a fixed threshold). Scientists can also use a t-test or a modified differential expression test to assess the significance of fold changes. However, these methods either fail to take advantage of the high dimensionality of gene expression data or fail to test fold change directly. Our research develops a new empirical Bayesian approach to substantially improve the power and accuracy of fold-change detection. Specifically, we more accurately estimate gene-wise error variation in the log of fold change. We then adopt a t-test with adjusted degrees of freedom for significance assessment. We apply our method to a dosage study in Arabidopsis and a Down syndrome study in humans to illustrate the utility of our approach. We also present a simulation study based on real datasets to demonstrate the accuracy of our method relative to error variance estimation and power in fold-change detection. Our developed R package with a detailed user manual is publicly available on GitHub at https://github.com/cuiyingbeicheng/Foldseq.

    View details for DOI 10.1093/nargab/lqaa072

    View details for PubMedID 33575620

    View details for PubMedCentralID PMC7671412

  • Pulmonary Disease Burden in Primary Immune Deficiency Disorders: Data from USIDNET Registry. Journal of clinical immunology Patrawala, M., Cui, Y., Peng, L., Fuleihan, R. L., Garabedian, E. K., Patel, K., Guglani, L. 2020; 40 (2): 340-349

    Abstract

    Pulmonary manifestations are common in patients with primary immunodeficiency disorders (PIDs) but the prevalence, specific diseases, and their patterns are not well characterized.We conducted a retrospective analysis of pulmonary diseases reported in the database of the United States Immunodeficiency Network (USIDNET), a program of the Immune Deficiency Foundation. PIDs were categorized into 10 groups and their demographics, pulmonary diagnoses and procedures, infections, prophylaxis regimens, and laboratory findings were analyzed.A total of 1937 patients with various PIDs (39.3% of total patients, 49.6% male, average age 37.9 years (SD = 22.4 years)) were noted to have a pulmonary disease comorbidity. Pulmonary diseases were categorized into broad categories: airway (86.8%), parenchymal (18.5%), pleural (4.6%), vascular (4.3%), and other (13.9%) disorders. Common variable immune deficiency (CVID) accounted for almost half of PIDs associated with airway, parenchymal, and other pulmonary disorders. Pulmonary procedures performed in 392 patients were mostly diagnostic (77.3%) or therapeutic (16.3%). These patients were receiving a wide variety of treatments, which included immunoglobulin replacement (82.1%), immunosuppressive (32.2%), anti-inflammatory (12.7%), biologic (9.3%), and cytokine (7.6%)-based therapies. Prophylactic therapy was being given with antibiotics (18.1%), antifungal (3.3%), and antiviral (2.2%) medications, and 7.1% of patients were on long-term oxygen therapy due to advanced lung disease.Pulmonary manifestations are common in individuals with PID, but long-term pulmonary outcomes are not well known in this group of patients. Further longitudinal follow-up will help to define long-term prognosis of respiratory comorbidities and optimal treatment modalities.

    View details for DOI 10.1007/s10875-019-00738-w

    View details for PubMedID 31919711