All Publications

  • Developing machine learning models to personalize care levels among emergency room patients for hospital admission. Journal of the American Medical Informatics Association : JAMIA Nguyen, M., Corbin, C. K., Eulalio, T., Ostberg, N. P., Machiraju, G., Marafino, B. J., Baiocchi, M., Rose, C., Chen, J. H. 2021


    OBJECTIVE: To develop prediction models for intensive care unit (ICU) vs non-ICU level-of-care need within 24 hours of inpatient admission for emergency department (ED) patients using electronic health record data.MATERIALS AND METHODS: Using records of 41 654 ED visits to a tertiary academic center from 2015 to 2019, we tested 4 algorithms-feed-forward neural networks, regularized regression, random forests, and gradient-boosted trees-to predict ICU vs non-ICU level-of-care within 24 hours and at the 24th hour following admission. Simple-feature models included patient demographics, Emergency Severity Index (ESI), and vital sign summary. Complex-feature models added all vital signs, lab results, and counts of diagnosis, imaging, procedures, medications, and lab orders.RESULTS: The best-performing model, a gradient-boosted tree using a full feature set, achieved an AUROC of 0.88 (95%CI: 0.87-0.89) and AUPRC of 0.65 (95%CI: 0.63-0.68) for predicting ICU care need within 24 hours of admission. The logistic regression model using ESI achieved an AUROC of 0.67 (95%CI: 0.65-0.70) and AUPRC of 0.37 (95%CI: 0.35-0.40). Using a discrimination threshold, such as 0.6, the positive predictive value, negative predictive value, sensitivity, and specificity were 85%, 89%, 30%, and 99%, respectively. Vital signs were the most important predictors.DISCUSSION AND CONCLUSIONS: Undertriaging admitted ED patients who subsequently require ICU care is common and associated with poorer outcomes. Machine learning models using readily available electronic health record data predict subsequent need for ICU admission with good discrimination, substantially better than the benchmarking ESI system. The results could be used in a multitiered clinical decision-support system to improve ED triage.

    View details for DOI 10.1093/jamia/ocab118

    View details for PubMedID 34402507

  • Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease. Cell de Goede, O. M., Nachun, D. C., Ferraro, N. M., Gloudemans, M. J., Rao, A. S., Smail, C., Eulalio, T. Y., Aguet, F., Ng, B., Xu, J., Barbeira, A. N., Castel, S. E., Kim-Hellmuth, S., Park, Y., Scott, A. J., Strober, B. J., GTEx Consortium, Brown, C. D., Wen, X., Hall, I. M., Battle, A., Lappalainen, T., Im, H. K., Ardlie, K. G., Mostafavi, S., Quertermous, T., Kirkegaard, K., Montgomery, S. B., Anand, S., Gabriel, S., Getz, G. A., Graubert, A., Hadley, K., Handsaker, R. E., Huang, K. H., Li, X., MacArthur, D. G., Meier, S. R., Nedzel, J. L., Nguyen, D. T., Segre, A. V., Todres, E., Balliu, B., Bonazzola, R., Brown, A., Conrad, D. F., Cotter, D. J., Cox, N., Das, S., Dermitzakis, E. T., Einson, J., Engelhardt, B. E., Eskin, E., Flynn, E. D., Fresard, L., Gamazon, E. R., Garrido-Martin, D., Gay, N. R., Guigo, R., Hamel, A. R., He, Y., Hoffman, P. J., Hormozdiari, F., Hou, L., Jo, B., Kasela, S., Kashin, S., Kellis, M., Kwong, A., Li, X., Liang, Y., Mangul, S., Mohammadi, P., Munoz-Aguirre, M., Nobel, A. B., Oliva, M., Park, Y., Parsana, P., Reverter, F., Rouhana, J. M., Sabatti, C., Saha, A., Stephens, M., Stranger, B. E., Teran, N. A., Vinuela, A., Wang, G., Wright, F., Wucher, V., Zou, Y., Ferreira, P. G., Li, G., Mele, M., Yeger-Lotem, E., Bradbury, D., Krubit, T., McLean, J. A., Qi, L., Robinson, K., Roche, N. V., Smith, A. M., Tabor, D. E., Undale, A., Bridge, J., Brigham, L. E., Foster, B. A., Gillard, B. M., Hasz, R., Hunter, M., Johns, C., Johnson, M., Karasik, E., Kopen, G., Leinweber, W. F., McDonald, A., Moser, M. T., Myer, K., Ramsey, K. D., Roe, B., Shad, S., Thomas, J. A., Walters, G., Washington, M., Wheeler, J., Jewell, S. D., Rohrer, D. C., Valley, D. R., Davis, D. A., Mash, D. C., Barcus, M. E., Branton, P. A., Sobin, L., Barker, L. K., Gardiner, H. M., Mosavel, M., Siminoff, L. A., Flicek, P., Haeussler, M., Juettemann, T., Kent, W. J., Lee, C. M., Powell, C. C., Rosenbloom, K. R., Ruffier, M., Sheppard, D., Taylor, K., Trevanion, S. J., Zerbino, D. R., Abell, N. S., Akey, J., Chen, L., Demanelis, K., Doherty, J. A., Feinberg, A. P., Hansen, K. D., Hickey, P. F., Jasmine, F., Jiang, L., Kaul, R., Kibriya, M. G., Li, J. B., Li, Q., Lin, S., Linder, S. E., Pierce, B. L., Rizzardi, L. F., Skol, A. D., Smith, K. S., Snyder, M., Stamatoyannopoulos, J., Tang, H., Wang, M., Carithers, L. J., Guan, P., Koester, S. E., Little, A. R., Moore, H. M., Nierras, C. R., Rao, A. K., Vaught, J. B., Volpi, S. 2021


    Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.

    View details for DOI 10.1016/j.cell.2021.03.050

    View details for PubMedID 33864768

  • Nonsense-mediated decay is highly stable across individuals and tissues. American journal of human genetics Teran, N. A., Nachun, D. C., Eulalio, T., Ferraro, N. M., Smail, C., Rivas, M. A., Montgomery, S. B. 2021


    Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.

    View details for DOI 10.1016/j.ajhg.2021.06.008

    View details for PubMedID 34216550

  • Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer's and Parkinson's diseases. Nature genetics Corces, M. R., Shcherbina, A., Kundu, S., Gloudemans, M. J., Fresard, L., Granja, J. M., Louie, B. H., Eulalio, T., Shams, S., Bagdatli, S. T., Mumbach, M. R., Liu, B., Montine, K. S., Greenleaf, W. J., Kundaje, A., Montgomery, S. B., Chang, H. Y., Montine, T. J. 2020


    Genome-wide association studies of neurological diseases have identified thousands of variants associated with disease phenotypes. However, most of these variants do not alter coding sequences, making it difficult to assign their function. Here, we present a multi-omic epigenetic atlas of the adult human brain through profiling of single-cell chromatin accessibility landscapes and three-dimensional chromatin interactions of diverse adult brain regions across a cohort of cognitively healthy individuals. We developed a machine-learning classifier to integrate this multi-omic framework and predict dozens of functional SNPs for Alzheimer's and Parkinson's diseases, nominating target genes and cell types for previously orphaned loci from genome-wide association studies. Moreover, we dissected the complex inverted haplotype of the MAPT (encoding tau) Parkinson's disease risk locus, identifying putative ectopic regulatory interactions in neurons that may mediate this disease association. This work expands understanding of inherited variation and provides a roadmap for the epigenomic dissection of causal regulatory variation in disease.

    View details for DOI 10.1038/s41588-020-00721-x

    View details for PubMedID 33106633