Weakly supervised classification of rare aortic valve malformations using unlabeled cardiac MRI sequences
View details for DOI 10.1038/s41467-019-11012-3
Snorkel: Rapid Training Data Creation with Weak Supervision
PROCEEDINGS OF THE VLDB ENDOWMENT
2017; 11 (3): 269–82
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8× faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.
View details for PubMedID 29770249
Measure what matters: Counts of hospitalized patients are a better metric for health system capacity planning for a reopening.
Journal of the American Medical Informatics Association : JAMIA
OBJECTIVE: Responding to the COVID-19 pandemic requires accurate forecasting of health system capacity requirements using readily available inputs. We examined whether testing and hospitalization data could help quantify the anticipated burden on the health system given shelter-in-place (SIP) order.MATERIALS AND METHODS: 16,103 SARS-CoV-2 RT-PCR tests were performed on 15,807 patients at Stanford facilities between March 2 and April 11, 2020. We analyzed the fraction of tested patients that were confirmed positive for COVID-19, the fraction of those needing hospitalization, and the fraction requiring ICU admission over the 40 days between March 2nd and April 11th 2020.RESULTS: We find a marked slowdown in the hospitalization rate within ten days of SIP even as cases continued to rise. We also find a shift towards younger patients in the age distribution of those testing positive for COVID-19 over the four weeks of SIP. The impact of this shift is a divergence between increasing positive case confirmations and slowing new hospitalizations, both of which affects the demand on health systems.CONCLUSION: Without using local hospitalization rates and the age distribution of positive patients, current models are likely to overestimate the resource burden of COVID-19. It is imperative that health systems start using these data to quantify effects of SIP and aid reopening planning.
View details for DOI 10.1093/jamia/ocaa076
View details for PubMedID 32548636
- Assessing the accuracy of automatic speech recognition for psychotherapy NPJ DIGITAL MEDICINE 2020; 3 (1)
Estimating the efficacy of symptom-based screening for COVID-19.
NPJ digital medicine
2020; 3: 95
There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity of COVID-19 symptoms. To assess the feasibility of symptom-based screening for COVID-19, we used data from tests for common respiratory viruses and SARS-CoV-2 in our health system to measure the ability to correctly classify virus test results based on presenting symptoms. Based on these results, symptom-based screening may not be an effective strategy to identify individuals who should be tested for SARS-CoV-2 infection or to obtain a leading indicator of new COVID-19 cases.
View details for DOI 10.1038/s41746-020-0300-0
View details for PubMedID 32695885
Snorkel: rapid training data creation with weak supervision.
The VLDB journal : very large data bases : a publication of the VLDB Endowment
2020; 29 (2): 709–30
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research laboratories. In a user study, subject matter experts build models 2.8 * faster and increase predictive performance an average 45.5 % versus seven hours of hand labeling. We study the modeling trade-offs in this new setting and propose an optimizer for automating trade-off decisions that gives up to 1.8 * speedup per pipeline execution. In two collaborations, with the US Department of Veterans Affairs and the US Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132 % average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60 % of the predictive performance of large hand-curated training sets.
View details for DOI 10.1007/s00778-019-00552-1
View details for PubMedID 32214778
The accuracy vs. coverage trade-off in patient-facing diagnosis models.
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
2020; 2020: 298–307
A third of adults in America use the Internet to diagnose medical concerns, and online symptom checkers are increasingly part of this process. These tools are powered by diagnosis models similar to clinical decision support systems, with the primary difference being the coverage of symptoms and diagnoses. To be useful to patients and physicians, these models must have high accuracy while covering a meaningful space of symptoms and diagnoses. To the best of our knowledge, this paper is the first in studying the trade-off between the coverage of the model and its performance for diagnosis. To this end, we learn diagnosis models with different coverage from EHR data. We find a 1% drop in top-3 accuracy for every 10 diseases added to the coverage. We also observe that complexity for these models does not affect performance, with linear models performing as well as neural networks.
View details for PubMedID 32477649
Assessing the accuracy of automatic speech recognition for psychotherapy.
NPJ digital medicine
2020; 3: 82
Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy in mental health settings has not been well described. It is unclear which metrics and thresholds are appropriate for different clinical use cases, which may range from population descriptions to individual safety monitoring. Here we show that automatic speech recognition is feasible in psychotherapy, but further improvements in accuracy are needed before widespread use. Our HIPAA-compliant automatic speech recognition system demonstrated a transcription word error rate of 25%. For depression-related utterances, sensitivity was 80% and positive predictive value was 83%. For clinician-identified harm-related sentences, the word error rate was 34%. These results suggest that automatic speech recognition may support understanding of language patterns and subgroup variation in existing treatments but may not be ready for individual-level safety surveillance.
View details for DOI 10.1038/s41746-020-0285-8
View details for PubMedID 32550644
View details for PubMedCentralID PMC7270106
Medical device surveillance with electronic health records.
NPJ digital medicine
2019; 2: 94
Post-market medical device surveillance is a challenge facing manufacturers, regulatory agencies, and health care providers. Electronic health records are valuable sources of real-world evidence for assessing device safety and tracking device-related patient outcomes over time. However, distilling this evidence remains challenging, as information is fractured across clinical notes and structured records. Modern machine learning methods for machine reading promise to unlock increasingly complex information from text, but face barriers due to their reliance on large and expensive hand-labeled training sets. To address these challenges, we developed and validated state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data. Using hip replacements-one of the most common implantable devices-as a test case, our methods accurately extracted implant details and reports of complications and pain from electronic health records with up to 96.3% precision, 98.5% recall, and 97.4% F1, improved classification performance by 12.8-53.9% over rule-based methods, and detected over six times as many complication events compared to using structured data alone. Using these additional events to assess complication-free survivorship of different implant systems, we found significant variation between implants, including for risk of revision surgery, which could not be detected using coded data alone. Patients with revision surgeries had more hip pain mentions in the post-hip replacement, pre-revision period compared to patients with no evidence of revision surgery (mean hip pain mentions 4.97 vs. 3.23; t = 5.14; p < 0.001). Some implant models were associated with higher or lower rates of hip pain mentions. Our methods complement existing surveillance mechanisms by requiring orders of magnitude less hand-labeled training data, offering a scalable solution for national medical device surveillance using electronic health records.
View details for DOI 10.1038/s41746-019-0168-z
View details for PubMedID 31583282
View details for PubMedCentralID PMC6761113
ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information.
Proceedings of machine learning research
2017; 68: 59–74
In healthcare applications, temporal variables that encode movement, health status and longitudinal patient evolution are often accompanied by rich structured information such as demographics, diagnostics and medical exam data. However, current methods do not jointly optimize over structured covariates and time series in the feature extraction process. We present ShortFuse, a method that boosts the accuracy of deep learning models for time series by explicitly modeling temporal interactions and dependencies with structured covariates. ShortFuse introduces hybrid convolutional and LSTM cells that incorporate the covariates via weights that are shared across the temporal domain. ShortFuse outperforms competing models by 3% on two biomedical applications, forecasting osteoarthritis-related cartilage degeneration and predicting surgical outcomes for cerebral palsy patients, matching or exceeding the accuracy of models that use features engineered by domain experts.
View details for PubMedID 30882086
Brundlefly at SemEval-2016 Task 12: Recurrent Neural Networks vs. Joint Inference for Clinical Temporal Information Extraction
Jason Alan Fries
View details for DOI 10.18653/v1/S16-1198