Professional Education


  • PhD, Johns Hopkins Bloomberg School of Public Health, Epidemiology-Cancer Epidemiology (2022)
  • SM, Harvard T.H.Chan School of Public Health, Environmental Health-Environmental Epidemiology (2017)
  • Bachelor of Medicine, Shandong University, Preventive Medicine (2015)

Stanford Advisors


All Publications


  • Risk of dementia due to Co-exposure to air pollution and neighborhood disadvantage. Environmental research Frndak, S., Deng, Z., Ward-Caviness, C. K., Gorski-Steiner, I., Thorpe, R. J., Dickerson, A. S. 2024: 118709

    Abstract

    BACKGROUND: Co-exposure to air pollution and neighborhood disadvantage may influence cognition decline. We tested these associations in the context of dementia risk.METHODS: We leveraged a cohort of adults ≥65 years (n = 5397) enrolled from 2011 to 2018 in the National Health and Aging Trends Study (NHATS). Particulate matter (PM) ≤ 10 mum in diameter, PM ≤ 2.5 mum in diameter, carbon monoxide, nitric oxide, and nitrogen dioxide - and neighborhood disadvantage were tested for joint associations with dementia risk. Pollutant concentrations at the 2010 census tract level were assigned using the US Environmental Protection Agency's Community Multiscale Air Quality Modeling System. Neighborhood disadvantage was defined using the tract Social Deprivation Index (SDI). Dementia was determined through self- or proxy-report or scores indicative of "probable dementia" according to NHATS screening tools. Joint effects of air pollutants and SDI were tested using quantile g-computation Cox proportional hazards models. We also stratified joint air pollution effects across SDI tertiles. Analyses adjusted for age at enrollment, sex, education, partner status, urbanicity, income, race and ethnicity, years at residence, census segregation, and census region.RESULTS: SDI score (aHR = 1.08; 95% CI 0.96, 1.22), joint air pollution (aHR = 1.03, 95% CI 0.92, 1.16) and joint SDI with air pollution (aHR = 1.04, 95% CI 0.89, 1.22) were not associated with dementia risk. After accounting for competing risk of death, joint SDI with air pollution was not associated with dementia risk (aHR = 1.06; 95% CI 0.87, 1.29). In stratified models, joint air pollution was associated with greater risk of dementia at high (aHR = 1.19; 95% CI 0.87, 1.63), but not at medium or low SDI.CONCLUSION: Air pollution was associated with greater dementia risk in disadvantaged areas after accounting for competing risks. Air pollution associations with dementia incidence may be attenuated when other risk factors are more prominent in disadvantaged neighborhoods.

    View details for DOI 10.1016/j.envres.2024.118709

    View details for PubMedID 38493859

  • Evaluation of PREDICT: a prognostic risk tool, after diagnosis of a second breast cancer. JNCI cancer spectrum Deng, Z., Jones, M. R., Wolff, A. C., Visvanathan, K. 2023

    Abstract

    BACKGROUND: PREDICT is a clinical tool widely used to estimate the prognosis of early-stage breast cancer (BC). The performance of PREDICT for a second primary BC is unknown.METHOD: Women 18years and older, diagnosed with a first or second invasive BC between 2000-2013 and followed for at least 5years were identified from the US Surveillance, Epidemiology, and End Results (SEER) database. Model calibration of PREDICT was evaluated by comparing predicted (P) and observed (O) 5-year BC-specific mortality separately by estrogen receptor (ER) status for first vs second BC. Receiver-operator curves and area under the curve (AUC) were used to assess model discrimination. Model performance was also evaluated for various races and ethnicities.RESULTS: The study population included 6,729 women diagnosed with a second BC and 357,204 women with a first BC. Overall, PREDICT demonstrated good discrimination for first and second BCs (AUCs ranging 0.73-0.82). PREDICT significantly underestimated 5-year BC mortality for second ER-positive BCs (P-O=-6.24%, 95%CI:-6.96%,-5.49%). Among women with a first ER-positive cancer, model calibration was good (P-O=-0.22%, 95%CI:-0.29%,-0.15%), except in Non-Hispanic Black (P-O=-2.33%, 95%CI:-2.65%,-2.01%) and women ≥80years of age (P-O=-3.75%, 95%CI:-4.12%,-3.41%). PREDICT performed well for second ER-negative cancers overall (P-O=-1.69%, 95%CI:-3.99%,0.16%), but underestimated mortality among those who previously received chemotherapy or had a first cancer with more aggressive tumor characteristics. In contrast, PREDICT overestimated mortality for first ER-negative cancers (P-O=4.54%, 95%CI : 4.27%,4.86%).CONCLUSION: PREDICT underestimated 5-year mortality after a second ER-positive BC and in certain subgroups of women with a second ER-negative BC.

    View details for DOI 10.1093/jncics/pkad081

    View details for PubMedID 37773987

  • Associations between race/ethnicity and SEER-CAHPS patient care experiences among female Medicare beneficiaries with breast cancer. Journal of geriatric oncology Dibble, K. E., Deng, Z., Jin, M., Connor, A. E. 2023; 14 (8): 101633

    Abstract

    INTRODUCTION: We aimed to determine if racial/ethnic disparities exist in survivorship care patient experiences among older breast cancer survivors.MATERIALS AND METHODS: Nineteen thousand seventeen female breast cancer survivors aged ≥65 at post-diagnosis survey contributed data via the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) and Centers for Medicare and Medicaid Services Consumer Assessment of Healthcare Providers & Systems (CAHPS) data linkage (2000-2019). Multivariable linear regression models were used to estimate adjusted beta (beta) coefficients and standard error (SE) estimates for associations between race/ethnicity and survivorship care patient experiences.RESULTS: Most women were non-Hispanic (NH)-White (78.1%; NH-Black [8.1%], NH-Asian [6.5%], Hispanic [6.2%]). On average, women reported 76.3years (standard deviation [SD]=7.14) at CAHPS survey and 6.10years since primary diagnosis (SD=3.51). Compared with NH-White survivors, NH-Black survivors reported lower mean scores for Getting Care Quickly (beta=-5.17, SE=0.69, p≤0.001), Getting Needed Care (beta=-1.72, SE=0.63, p=0.006), and Overall Care Ratings (beta=-2.72, SE=0.48, p≤0.001), mirroring the results for NH-Asian survivors (Getting Care Quickly [beta=-7.06, SE=0.77, p≤0.001], Getting Needed Care [beta=-4.43, SE=0.70, p≤0.001], Physician Communication [beta=-1.15, SE=0.54, p=0.03], Overall Care Rating [beta=-2.32, SE=0.53, p≤0.001]). Findings among Hispanic survivors varied, where mean scores were lower for Getting Care Quickly (beta=-2.83, SE=0.79, p≤0.001), Getting Needed Care (beta=-2.43, SE=0.70, p=0.001), and Getting Needed Prescription Drug(s) (beta=-1.47, SE=0.64, p=0.02), but were higher for Health Plan Rating (beta=2.66, SE=0.55, p≤0.001). Education, Medicare plan, and multimorbidity significantly modified various associations among NH-Black survivors, and education was a significant modifier among NH-Asian and Hispanic survivors.DISCUSSION: We observed racial/ethnic disparities in the associations with survivorship care patient experience among NH-Black, Hispanic, and NH-Asian breast cancer survivors. Future research should examine the impact of education, Medicare plans, and multimorbidity on these associations.

    View details for DOI 10.1016/j.jgo.2023.101633

    View details for PubMedID 37741036

  • Lifetime body weight trajectories and risk of renal cell cancer: a large US prospective cohort study. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology Deng, Z., Hajihosseini, M., Moore, J. X., Khan, S., Graff, R. E., Bondy, M. L., Chung, B. I., Langston, M. E. 2023

    Abstract

    Body mass index (BMI) is a known risk factor for renal cell cancer (RCC), but data are limited as to the effect of lifetime exposure to excess bodyweight.Using the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial (N=138,614, 527 incident RCCs), we identified several anthropometric measures to capture the lifetime BMI patterns: 1) BMI at specific ages; 2) adulthood BMI trajectories; 3) cumulative exposure to overweight/obesity denoted as weighted years of living overweight/obese (WYO); and 4) weight change during each age span. We conducted multivariable Cox model to quantify the association between each anthropometric metric and incident RCC.A higher BMI at ages 20 and 50 and at baseline was associated with a greater hazard of RCC. Compared to individuals who retained normal BMI throughout adulthood, we observed an increased hazard of RCC for BMI trajectory of progressing from normal BMI to overweight (HR:1.49, 95%CI:1.19,1.87), from normal BMI to obesity (HR:2.22, 95%CI:1.70,2.90), and from overweight to obesity (HR:2.78, 95%CI:1.81,4.27). Compared to individuals who were never overweight (WYO=0), elevated HRs were observed among individuals who experienced low (HR:1.31, 95%CI:0.99,1.74), medium (HR:1.57, 95%CI:1.20,2.05), and high (HR:2.10, 95%CI:1.62,2.72) WYO tertile. Weight gain of ≥10kg was associated with increased RCC incidence for each age span.Across the lifespan, being overweight/obese, weight gain, and higher cumulative exposure to excess weight were all associated with increased RCC risk.It is important to avoid weight gain and assess BMI from a life-course perspective to reduce RCC risk.

    View details for DOI 10.1158/1055-9965.EPI-23-0668

    View details for PubMedID 37624040

  • Outdoor Air Pollution, Environmental Injustice, and Cognitive Decline: a Review CURRENT EPIDEMIOLOGY REPORTS Dickerson, A. S., Frndak, S., Gorski-Steiner, I., Deng, Z., Jenson, T. E., Mohan, A., Kim, J., Boerner, V., Thorpe Jr, R. J. 2023
  • Differences in survivorship care experiences among older breast cancer survivors by clinical cancer characteristics, race/ethnicity, and socioeconomic factors: A SEER-CAHPS study BREAST CANCER RESEARCH AND TREATMENT Dibble, K. E., Deng, Z., Connor, A. E. 2023: 565-582

    Abstract

    To determine if disparities exist in survivorship care experiences among older breast cancer survivors by breast cancer characteristics, race/ethnicity, and socioeconomic factors.A total of 19,017 female breast cancer survivors (≥ 65 at post-diagnosis survey) contributed data via SEER-CAHPS data linkage (2000-2019). Analyses included overall and stratified multivariable linear regression to estimate beta (β) coefficients and standard errors (SE) to identify relationships between clinical cancer characteristics and survivorship care experiences.Minority survivors were mostly non-Hispanic (NH)-Black (8.1%) or NH-Asian (6.5%). Survivors were 76.3 years (SD = 7.14) at CAHPS survey and were 6.10 years (SD = 3.51) post-diagnosis on average. Survivors with regional breast cancer vs. localized at diagnosis (β = 1.00, SE = 0.46, p = 0.03) or treated with chemotherapy vs. no chemotherapy/unknown (β = 1.05, SE = 0.48, p = 0.03) reported higher mean scores for Getting Needed Care. Results were similar for Overall Care Ratings (β = 0.87, SE = 0.38, p = 0.02) among women treated with chemotherapy. Conversely, women diagnosed with distant breast cancer vs. localized reported lower mean scores for Physician Communication (β = - 1.94, SE = 0.92, p = 0.03). Race/ethnicity, education, and area-level poverty significantly modified several associations between stage, estrogen receptor status, treatments, and various CAHPS outcomes.These study findings can be used to inform survivorship care providers treating women diagnosed with more advanced stage and aggressive disease. The disparities we observed among minority groups and by socioeconomic status should be further evaluated in future research as these interactions could impact long-term outcomes, including survival.

    View details for DOI 10.1007/s10549-023-06948-6

    View details for Web of Science ID 000974778300001

    View details for PubMedID 37093399

    View details for PubMedCentralID 8478795

  • The impact of multimorbidity on the relationship between breast cancer tumor characteristics and survivorship care experiences among older women: A SEER-CAHPS analysis Dibble, K. E., Deng, Z., Connor, A. E. AMER ASSOC CANCER RESEARCH. 2023
  • The Impact of Cardiovascular Disease Risk on Cancer Progression among Female Breast Cancer Survivors: A Longitudinal Study within The Boss Cohort Feng, X., Deng, Z., McCollough, M., May, B., Selznick, E., Connor, A., Armstrong, D. K., Visvanathan, K. AMER ASSOC CANCER RESEARCH. 2023
  • Racial/ethnic disparities in perceived quality of breast cancer survivorship care among older women by general health status: A SEER-CAHPS study Dibble, K. E., Deng, Z., Connor, A. E. AMER ASSOC CANCER RESEARCH. 2023: 313
  • Racial and ethnic disparities in mortality among breast cancer survivors after a second malignancy JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE Deng, Z., Jones, M. R., Wang, M., Wolff, A. C., Visvanathan, K. 2022: 279-287

    Abstract

    Racial and ethnic differences in survival after a first cancer are well established but have not been examined after a second primary cancer (SPC) despite the increasing incidence among survivors.We examined 39 029 female breast cancer survivors who developed an SPC between 2000 and 2014 in the Surveillance, Epidemiology, and End Results 18 database. Multivariable Cox proportional hazards regression for competing risks data was used to estimate hazard ratios (HR) and 95% confidence intervals (CI) for cancer and cardiovascular disease mortality after SPCs comparing Hispanic, Non-Hispanic Asian, and Non-Hispanic Black survivors with Non-Hispanic White survivors. Models were adjusted for sociodemographics, tumor characteristics, and treatments of the first and second cancer. Analyses were stratified by SPC type.During 17 years of follow-up, there were 15 117 deaths after SPCs. The risk of cancer death was 12% higher among Non-Hispanic Black survivors (HR = 1.12, 95% CI = 1.05 to 1.19) and 8% higher among Hispanic survivors (HR = 1.08, 95% CI = 1.00 to 1.16) compared with Non-Hispanic White survivors. In subgroup analyses, the strongest associations were observed among Non-Hispanic Black survivors with a second breast or uterine cancer and among Hispanic survivors with a second breast cancer. Non-Hispanic Black survivors also experienced a 44% higher risk of cardiovascular disease death after SPC diagnosis than Non-Hispanic White survivors (HR = 1.44, 95% CI = 1.20 to 1.74).Higher cancer mortality among Non-Hispanic Black and Hispanic survivors and higher cardiovascular mortality among Non-Hispanic Black survivors exist among women who survive a first breast cancer to develop an SPC. Studies focused on identifying the contributors to these disparities are needed to enable implementation of effective mitigation strategies.

    View details for DOI 10.1093/jnci/djac220

    View details for Web of Science ID 000915105900001

    View details for PubMedID 36529890

    View details for PubMedCentralID PMC9996210

  • Mortality after second malignancy in breast cancer survivors compared to a first primary cancer: a nationwide longitudinal cohort study NPJ BREAST CANCER Deng, Z., Jones, M. R., Wang, M., Visvanathan, K. 2022; 8 (1): 82

    Abstract

    Limited information exists about survival outcomes after second primary cancers (SPCs) among breast cancer survivors. Studies suggest that mortality after certain SPCs may be higher than mortality after first primary cancers (FPCs) of the same type. A cohort study was conducted among 63,424 US women using the Surveillance, Epidemiology, and End Results 18 database (2000-2016) to compare mortality after a SPC among breast cancer survivors to mortality among women after a FPC using Cox proportional hazard regression. Propensity scores were used to match survivors with SPCs to women with FPCs 1:1 based on cancer type and prognostic factors. During a median follow-up of 42 months, 11,532 cancer deaths occurred after SPCs among survivors compared to 9305 deaths after FPCs. Cumulative cancer mortality was 44.7% for survivors with SPCs and 35.2% for women with FPCs. Survivors with SPCs had higher risk of cancer death (hazard ratio (HR): 1.27, 95% CI: 1.23-1.30) and death overall (HR: 1.18, 95% CI: 1.15-1.21) than women with FPCs. Increased risk of cancer death after SPCs compared to FPCs was observed for cancer in breast, lung, colon and/or rectum, uterus, lymphoma, melanoma, thyroid, and leukemia. Estrogen receptor status and treatment of the prior breast cancer as well as time between prior breast cancer and SPC significantly modified the mortality difference between women with SPC and FPC. A more tailored approach to early detection and treatment could improve outcomes from second cancer in breast cancer survivors.

    View details for DOI 10.1038/s41523-022-00447-5

    View details for Web of Science ID 000825409500001

    View details for PubMedID 35835760

    View details for PubMedCentralID PMC9283416

  • Validation of the PREDICT breast cancer tool in a multiethnic population of US women after a second primary breast cancer. Deng, Z., Jones, M. R., Wang, M., Visvanathan, K. LIPPINCOTT WILLIAMS & WILKINS. 2022
  • Racial/ethnic disparities in cancer mortality after a second breast cancer Deng, Z., Jones, M. R., Wang, M., Visvanathan, K. AMER ASSOC CANCER RESEARCH. 2022
  • Associations of prenatal exposure to mixtures of organochlorine pesticides and smoking and drinking behaviors in adolescence ENVIRONMENTAL RESEARCH Dickerson, A. S., Deng, Z., Ransome, Y., Factor-Litvak, P., Karlsson, O. 2022; 206: 112431

    Abstract

    It is important to identify the factors that influence the prevalence of disinhibitory behaviors, as tobacco and alcohol use in adolescence is a strong predictor of continued use and substance abuse into adulthood. Organochlorine pesticides (OCPs) are persistent organic pollutants that pose a potential risk to the developing fetus and offspring long-term health. We examined associations between prenatal exposure OCPs and their metabolites (i.e., p,p'-DDT, p,p'-DDE, o,p'-DDT, oxychlordane, and hexachlorobenzene (HCB)), both as a mixture and single compounds, and alcohol consumption and smoking at adolescence in a sample (n = 554) from the Child Health and Development Studies prospective birth cohort. Bayesian Kernel Machine Regression demonstrated a trend of higher risk of alcohol use and smoking with higher quartile mixture levels. Single-component analysis showed increased odds of smoking and drinking with increases in lipid-adjusted p,p'-DDE serum levels (aOR = 2.06, 95% CI 0.99-4.31, p = 0.05, per natural log unit increase). We found significant effect modification in these associations by sex with higher p,p'-DDT serum levels (aOR = 0.26, 95% CI 0.09-0.076, p = 0.01, per natural log unit increase) was associated with lower odds of smoking and drinking in female adolescents, while higher p,p'-DDE serum levels (aOR = 2.98, 95% CI 1.04-8.51, p = 0.04, per natural log unit increase) was associated with higher odds of the outcomes. Results of the mutually adjusted model were not significant for male adolescents. Further research to understand reasons for these sex-differences are warranted.

    View details for DOI 10.1016/j.envres.2021.112431

    View details for Web of Science ID 000766430700010

    View details for PubMedID 34848208

  • Increased cancer mortality after second primary malignancy among breast cancer survivors Deng, Z., Jones, M. R., Visvanathan, K. AMER ASSOC CANCER RESEARCH. 2021
  • Shorter survival and later stage at diagnosis among unmarried patients with cutaneous melanoma: A US national and tertiary care center study JOURNAL OF THE AMERICAN ACADEMY OF DERMATOLOGY Rachidi, S., Deng, Z., Sullivan, D. Y., Lipson, E. J. 2020; 83 (4): 1012-1020

    Abstract

    Addressing risk factors of delayed melanoma detection minimizes disparities in outcome.To elucidate the significance of marital status in melanoma outcomes across anatomic sites.Retrospective cohort study of 73,558 patients from the Surveillance, Epidemiology, and End Results (SEER) program and 2992 patients at Johns Hopkins University. Patients were stratified by marital status, anatomic site, age, and sex. Endpoints were prevalence of advanced melanoma (stages III or IV) and survival.In the SEER cohort, single patients were more likely than married patients to present in stages III or IV among both men (prevalence ratio [PR], 1.45; 95% confidence interval [CI], 1.37-1.53) and women (PR, 1.28; 95% confidence interval, 1.18-1.39). This trend was consistent across all anatomic sites and in all age groups, particularly in those 18 to 68 years old. Overall and cancer-specific survival times were shorter in unmarried patients. Similarly, at Johns Hopkins, single patients had increased prevalence of advanced melanoma (PR, 1.54; 95% CI, 1.21-1.94) and experienced shorter overall survival (hazard ratio, 1.51; 95% CI, 1.15-1.99).The anatomic sites were not very specific, and this was a retrospective study.Unmarried patients, especially men and those younger than 68 years, are diagnosed at more advanced stages, even in readily visible sites such as the face. They also experience worse survival independent of stage.

    View details for DOI 10.1016/j.jaad.2020.05.088

    View details for Web of Science ID 000569306800006

    View details for PubMedID 32446825

  • Performance of Breast Cancer Risk-Assessment Models in a Large Mammography Cohort JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE McCarthy, A., Guan, Z., Welch, M., Griffin, M. E., Sippo, D. A., Deng, Z., Coopey, S. B., Acar, A., Semine, A., Parmigiani, G., Braun, D., Hughes, K. S. 2020; 112 (5): 489-497

    Abstract

    Several breast cancer risk-assessment models exist. Few studies have evaluated predictive accuracy of multiple models in large screening populations.We evaluated the performance of the BRCAPRO, Gail, Claus, Breast Cancer Surveillance Consortium (BCSC), and Tyrer-Cuzick models in predicting risk of breast cancer over 6 years among 35 921 women aged 40-84 years who underwent mammography screening at Newton-Wellesley Hospital from 2007 to 2009. We assessed model discrimination using the area under the receiver operating characteristic curve (AUC) and assessed calibration by comparing the ratio of observed-to-expected (O/E) cases. We calculated the square root of the Brier score and positive and negative predictive values of each model.Our results confirmed the good calibration and comparable moderate discrimination of the BRCAPRO, Gail, Tyrer-Cuzick, and BCSC models. The Gail model had slightly better O/E ratio and AUC (O/E = 0.98, 95% confidence interval [CI] = 0.91 to 1.06, AUC = 0.64, 95% CI = 0.61 to 0.65) compared with BRCAPRO (O/E = 0.94, 95% CI = 0.88 to 1.02, AUC = 0.61, 95% CI = 0.59 to 0.63) and Tyrer-Cuzick (version 8, O/E = 0.84, 95% CI = 0.79 to 0.91, AUC = 0.62, 95% 0.60 to 0.64) in the full study population, and the BCSC model had the highest AUC among women with available breast density information (O/E = 0.97, 95% CI = 0.89 to 1.05, AUC = 0.64, 95% CI = 0.62 to 0.66). All models had poorer predictive accuracy for human epidermal growth factor receptor 2 positive and triple-negative breast cancers than hormone receptor positive human epidermal growth factor receptor 2 negative breast cancers.In a large cohort of patients undergoing mammography screening, existing risk prediction models had similar, moderate predictive accuracy and good calibration overall. Models that incorporate additional genetic and nongenetic risk factors and estimate risk of tumor subtypes may further improve breast cancer risk prediction.

    View details for DOI 10.1093/jnci/djz177

    View details for Web of Science ID 000537454900009

    View details for PubMedID 31556450

    View details for PubMedCentralID PMC7225681

  • Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes JCO CLINICAL CANCER INFORMATICS Bao, Y., Deng, Z., Wang, Y., Kim, H., Armengol, V., Acevedo, F., Ouardaoui, N., Wang, C., Parmigiani, G., Barzilay, R., Braun, D., Hughes, K. S. 2019; 3: 1-9

    Abstract

    The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools that help to monitor and prioritize the literature to understand the clinical implications of pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance-risk of cancer for germline mutation carriers-or prevalence of germline genetic mutations.We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated data set for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule on the basis of the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule on the basis of the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence.For penetrance classification, we annotated 3,740 paper titles and abstracts and evaluated the two models using 10-fold cross-validation. The SVM model achieved 88.93% accuracy-percentage of papers that were correctly classified-whereas the CNN model achieved 88.53% accuracy. For prevalence classification, we annotated 3,753 paper titles and abstracts. The SVM model achieved 88.92% accuracy and the CNN model achieved 88.52% accuracy.Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date.

    View details for DOI 10.1200/CCI.19.00042

    View details for Web of Science ID 000488810500001

    View details for PubMedID 31545655

    View details for PubMedCentralID PMC6873946

  • Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance. JCO clinical cancer informatics Deng, Z., Yin, K., Bao, Y., Armengol, V. D., Wang, C., Tiwari, A., Barzilay, R., Parmigiani, G., Braun, D., Hughes, K. S. 2019; 3: 1-9

    Abstract

    Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes-that is, penetrance-enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) -based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure.We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene-cancer penetrance meta-analyses spanning 16 gene-cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage).Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%-we are able to identify 132 of 142 studies-before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies).We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.

    View details for DOI 10.1200/CCI.19.00043

    View details for PubMedID 31419182

    View details for PubMedCentralID PMC6873944

  • Incidental atypical hyperplasia/LCIS in mammoplasty specimens and subsequent risk of breast cancer Acevedo, F., Armengol, V., Deng, Z., Tang, R., Coopey, S., Mazzola, E., Lanahan, C., Braun, D., Yala, A., Barzilay, R., Li, C., Santus, E., Colwell, A., Guidi, A., Cetrulo, C., Garber, J., Smith, B. L., King, T. A., Hughes, K. S. AMER SOC CLINICAL ONCOLOGY. 2019
  • Pathologic findings in reduction mammoplasty specimens: a surrogate for the population prevalence of breast cancer and high-risk lesions BREAST CANCER RESEARCH AND TREATMENT Acevedo, F., Armengol, V., Deng, Z., Tang, R., Coopey, S. B., Braun, D., Yala, A., Barzilay, R., Li, C., Colwell, A., Guidi, A., Cetrulo, C. L., Garber, J., Smith, B. L., King, T., Hughes, K. S. 2019; 173 (1): 201-207

    Abstract

    Mammoplasty removes random samples of breast tissue from asymptomatic women providing a unique method for evaluating background prevalence of breast pathology in normal population. Our goal was to identify the rate of atypical breast lesions and cancers in women of various ages in the largest mammoplasty cohort reported to date.We analyzed pathologic reports from patients undergoing bilateral mammoplasty, using natural language processing algorithm, verified by human review. Patients with a prior history of breast cancer or atypia were excluded.A total of 4775 patients were deemed eligible. Median age was 40 (range 13-86) and was higher in patients with any incidental finding compared to patients with normal reports (52 vs. 39 years, p = 0.0001). Pathological findings were detected in 7.06% (337) of procedures. Benign high-risk lesions were found in 299 patients (6.26%). Invasive carcinoma and ductal carcinoma in situ were detected in 15 (0.31%) and 23 (0.48%) patients, respectively. The rate of atypias and cancers increased with age.The overall rate of abnormal findings in asymptomatic patients undergoing mammoplasty was 7.06%, increasing with age. As these results are based on random sample of breast tissue, they likely underestimate the prevalence of abnormal findings in asymptomatic women.

    View details for DOI 10.1007/s10549-018-4962-0

    View details for Web of Science ID 000459448500021

    View details for PubMedID 30238276

  • Managing Patient with Mutations in PALB2, CHEK2, or ATM CURRENT BREAST CANCER REPORTS Acevedo, F., Deng, Z., Armengol, V. D., Hughes, K. 2018; 10 (2): 74-82
  • Using Twitter to Better Understand the Spatiotemporal Patterns of Public Sentiment: A Case Study in Massachusetts, USA INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH Cao, X., MacNaughton, P., Deng, Z., Yin, J., Zhang, X., Allen, J. G. 2018; 15 (2)

    Abstract

    Twitter provides a rich database of spatiotemporal information about users who broadcast their real-time opinions, sentiment, and activities. In this paper, we sought to investigate the holistic influence of land use and time period on public sentiment. A total of 880,937 tweets posted by 26,060 active users were collected across Massachusetts (MA), USA, through 31 November 2012 to 3 June 2013. The IBM Watson Alchemy API (application program interface) was employed to quantify the sentiment scores conveyed by tweets on a large scale. Then we statistically analyzed the sentiment scores across different spaces and times. A multivariate linear mixed-effects model was used to quantify the fixed effects of land use and the time period on the variations in sentiment scores, considering the clustering effect of users. The results exposed clear spatiotemporal patterns of users' sentiment. Higher sentiment scores were mainly observed in the commercial and public areas, during the noon/evening and on weekends. Our findings suggest that social media outputs can be used to better understand the spatial and temporal patterns of public happiness and well-being in cities and regions.

    View details for DOI 10.3390/ijerph15020250

    View details for Web of Science ID 000426721400075

    View details for PubMedID 29393869

    View details for PubMedCentralID PMC5858319

  • Proportions and Risk Factors of Developing Multidrug Resistance Among Patients with Tuberculosis in China: A Population-Based Case-Control Study MICROBIAL DRUG RESISTANCE Huai, P., Huang, X., Cheng, J., Zhang, C., Wang, K., Wang, X., Yang, L., Deng, Z., Ma, W. 2016; 22 (8): 717-726

    Abstract

    Limited studies have been conducted to explore risk factors of developing multidrug-resistant tuberculosis (MDR-TB) in China. This study aimed to find the proportions and risk factors of developing MDR-TB in China among new patients and previously treated tuberculosis (TB) patients.A population-based case-control study was conducted from March 2010 to December 2013 in five cities in China. Proportions and risk factors of developing MDR-TB were calculated and analyzed separately for new patients and previously treated patients.The proportion of MDR-TB was 3.9% among new patients and 25.3% among previously treated patients in our study population. The proportion of extensively drug resistant TB was 0.1% among new patients and 1.4% among previously treated patients in our study population. Multivariate analysis found that being registered as migrants (odds ratio [OR] = 6.08; 95% confidence interval [CI]: 1.75-21.09), having more than three affected lung fields (OR = 2.18; 95% CI: 1.20-2.94), having more than 8 months of initial treatment (OR = 2.15; 95% CI: 1.09-4.28), having more than three prior episodes of anti-TB treatment (OR = 3.10; 95% CI: 1.48-6.48), and experiencing failure or continued worsening from the last treatment (OR = 3.82; 95% CI: 1.86-7.85) were associated with developing MDR-TB in previously treated patients with TB. Univariate analysis showed that less than 30 years of living in the same location (p = 0.034) was a risk factor for new patients with MDR-TB.The surveillance of multidrug resistance among patients with previously treated TB who also possess these risk factors and the management of patients with MDR-TB should be reinforced.

    View details for DOI 10.1089/mdr.2015.0186

    View details for Web of Science ID 000390414700015

    View details for PubMedID 27058017

  • Impacts of Tropical Cyclones and Accompanying Precipitation on Infectious Diarrhea in Cyclone Landing Areas of Zhejiang Province, China INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH Deng, Z., Xun, H., Zhou, M., Jiang, B., Wang, S., Guo, Q., Wang, W., Kang, R., Wang, X., Marley, G., Ma, W. 2015; 12 (2): 1054-1068

    Abstract

    Zhejiang Province, located in southeastern China, is frequently hit by tropical cyclones. This study quantified the associations between infectious diarrhea and the seven tropical cyclones that landed in Zhejiang from 2005-2011 to assess the impacts of the accompanying precipitation on the studied diseases.A unidirectional case-crossover study design was used to evaluate the impacts of tropical storms and typhoons on infectious diarrhea. Principal component analysis (PCA) was applied to eliminate multicollinearity. A multivariate logistic regression model was used to estimate the odds ratios (ORs) and the 95% confidence intervals (CIs).For all typhoons studied, the greatest impacts on bacillary dysentery and other infectious diarrhea were identified on lag 6 days (OR = 2.30, 95% CI: 1.81-2.93) and lag 5 days (OR = 3.56, 95% CI: 2.98-4.25), respectively. For all tropical storms, impacts on these diseases were highest on lag 2 days (OR = 2.47, 95% CI: 1.41-4.33) and lag 6 days (OR = 2.46, 95% CI: 1.69-3.56), respectively. The tropical cyclone precipitation was a risk factor for both bacillary dysentery and other infectious diarrhea when daily precipitation reached 25 mm and 50 mm with the largest OR = 3.25 (95% CI: 1.45-7.27) and OR = 3.05 (95% CI: 2.20-4.23), respectively.Both typhoons and tropical storms could contribute to an increase in risk of bacillary dysentery and other infectious diarrhea in Zhejiang. Tropical cyclone precipitation may also be a risk factor for these diseases when it reaches or is above 25 mm and 50 mm, respectively. Public health preventive and intervention measures should consider the adverse health impacts from tropical cyclones.

    View details for DOI 10.3390/ijerph120201054

    View details for Web of Science ID 000350209800001

    View details for PubMedID 25622139

    View details for PubMedCentralID PMC4344654