I am a PhD student in Bioengineering specializing in the intersection of biodesign and machine learning for understanding, treating, and tracking neuropsychiatric conditions.
As a highly interdisciplinary translational researcher, I have several academic interests and my thesis work therefore spans the engineering, design, scientific, algorithmic, and clinical questions associated with developing new technologies to transform healthcare and diagnostics.
Before coming to Stanford, I completed an undergraduate degree in Computer Science at Rice University in Houston, Texas.
Education & Certifications
Master of Science, Stanford University, CS-MS (2018)
BA, Rice University, Computer Science (2015)
Current Research and Scholarly Interests
I am currently a graduate student in Bioengineering specializing in biomedical data science, utilizing techniques from and innovating in crowdsourcing healthcare, applied machine learning, computational psychiatry, translational bioinformatics, human-computer interaction, and mobile/wearable systems.
I have several academic interests and my thesis work therefore spans the engineering, design, scientific, algorithmic, and clinical questions associated with developing new technologies to transform healthcare and diagnostics.
A maximum flow-based network approach for identification of stable noncoding biomarkers associated with the multigenic neurological condition, autism.
2021; 14 (1): 28
BACKGROUND: Machine learning approaches for predicting disease risk from high-dimensional whole genome sequence (WGS) data often result in unstable models that can be difficult to interpret, limiting the identification of putative sets of biomarkers. Here, we design and validate a graph-based methodology based on maximum flow, which leverages the presence of linkage disequilibrium (LD) to identify stable sets of variants associated with complex multigenic disorders.RESULTS: We apply our method to a previously published logistic regression model trained to identify variants in simple repeat sequences associated with autism spectrum disorder (ASD); this L1-regularized model exhibits high predictive accuracy yet demonstrates great variability in the features selected from over 230,000 possible variants. In order to improve model stability, we extract the variants assigned non-zero weights in each of 5 cross-validation folds and then assemble the five sets of features into a flow network subject to LD constraints. The maximum flow formulation allowed us to identify 55 variants, which we show to be more stable than the features identified by the original classifier.CONCLUSION: Our method allows for the creation of machine learning models that can identify predictive variants. Our results help pave the way towards biomarker-based diagnosis methods for complex genetic disorders.
View details for DOI 10.1186/s13040-021-00262-x
View details for PubMedID 33941233
Estimating sequencing error rates using families.
2021; 14 (1): 27
BACKGROUND: As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample.RESULTS: We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method's versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites.CONCLUSION: Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.
View details for DOI 10.1186/s13040-021-00259-6
View details for PubMedID 33892748
Crowdsourced privacy-preserved feature tagging of short home videos for machine learning ASD detection.
2021; 11 (1): 7620
Standard medical diagnosis of mental health conditions requires licensed experts who are increasingly outnumbered by those at risk, limiting reach. We test the hypothesis that a trustworthy crowd of non-experts can efficiently annotate behavioral features needed for accurate machine learning detection of the common childhood developmental disorder Autism Spectrum Disorder (ASD) for children under 8years old. We implement a novel process for identifying andcertifyinga trustworthy distributed workforce for video feature extraction, selecting a workforce of 102 workers from a pool of 1,107. Two previously validated ASD logistic regression classifiers, evaluated against parent-reported diagnoses, were used to assess the accuracy of the trusted crowd's ratings of unstructured home videos. A representative balanced sample (N=50 videos) of videos were evaluated with and without face box and pitch shift privacy alterations, with AUROC and AUPRC scores>0.98. With both privacy-preserving modifications, sensitivity is preserved (96.0%) while maintaining specificity (80.0%) and accuracy (88.0%) at levels comparable to prior classification methods without alterations. We find that machine learning classification from features extracted by a certified nonexpert crowd achieves high performance for ASD detection from natural home videos of the child at risk and maintains high sensitivity when privacy-preserving mechanisms are applied. These results suggest that privacy-safeguarded crowdsourced analysis of short home videos can help enable rapid and mobile machine-learning detection of developmental delays in children.
View details for DOI 10.1038/s41598-021-87059-4
View details for PubMedID 33828118
Indels in SARS-CoV-2 occur at template-switching hotspots.
2021; 14 (1): 20
The evolutionary dynamics of SARS-CoV-2 have been carefully monitored since the COVID-19 pandemic began in December 2019. However, analysis has focused primarily on single nucleotide polymorphisms and largely ignored the role of insertions and deletions (indels) as well as recombination in SARS-CoV-2 evolution. Using sequences from the GISAID database, we catalogue over 100 insertions and deletions in the SARS-CoV-2 consensus sequences. We hypothesize that these indels are artifacts of recombination events between SARS-CoV-2 replicates whereby RNA-dependent RNA polymerase (RdRp) re-associates with a homologous template at a different loci ("imperfect homologous recombination"). We provide several independent pieces of evidence that suggest this. (1) The indels from the GISAID consensus sequences are clustered at specific regions of the genome. (2) These regions are also enriched for 5' and 3' breakpoints in the transcription regulatory site (TRS) independent transcriptome, presumably sites of RNA-dependent RNA polymerase (RdRp) template-switching. (3) Within raw reads, these indel hotspots have cases of both high intra-host heterogeneity and intra-host homogeneity, suggesting that these indels are both consequences of de novo recombination events within a host and artifacts of previous recombination. We briefly analyze the indels in the context of RNA secondary structure, noting that indels preferentially occur in "arms" and loop structures of the predicted folded RNA, suggesting that secondary structure may be a mechanism for TRS-independent template-switching in SARS-CoV-2 or other coronaviruses. These insights into the relationship between structural variation and recombination in SARS-CoV-2 can improve our reconstructions of the SARS-CoV-2 evolutionary history as well as our understanding of the process of RdRp template-switching in RNA viruses.
View details for DOI 10.1186/s13040-021-00251-0
View details for PubMedID 33743803
Achieving Trustworthy Biomedical Data Solutions.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2021; 26: 1–13
Privacy and trust of biomedical solutions that capture and share data is an issue rising to the center of public attention and discourse. While large-scale academic, medical, and industrial research initiatives must collect increasing amounts of personal biomedical data from patient stakeholders, central to ensuring precision health becomes a reality, methods for providing sufficient privacy in biomedical databases and conveying a sense of trust to the user is equally crucial for the field of biocomputing to advance with the grace of those stakeholders. If the intended audience does not trust new precision health innovations, funding and support for these efforts will inevitably be limited. It is therefore crucial for the field to address these issues in a timely manner. Here we describe current research directions towards achieving trustworthy biomedical informatics solutions.
View details for PubMedID 33690999
Selection of trustworthy crowd workers for telemedical diagnosis of pediatric autism spectrum disorder.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2021; 26: 14–25
Crowd-powered telemedicine has the potential to revolutionize healthcare, especially during times that require remote access to care. However, sharing private health data with strangers from around the world is not compatible with data privacy standards, requiring a stringent filtration process to recruit reliable and trustworthy workers who can go through the proper training and security steps. The key challenge, then, is to identify capable, trustworthy, and reliable workers through high-fidelity evaluation tasks without exposing any sensitive patient data during the evaluation process. We contribute a set of experimentally validated metrics for assessing the trustworthiness and reliability of crowd workers tasked with providing behavioral feature tags to unstructured videos of children with autism and matched neurotypical controls. The workers are blinded to diagnosis and blinded to the goal of using the features to diagnose autism. These behavioral labels are fed as input to a previously validated binary logistic regression classifier for detecting autism cases using categorical feature vectors. While the metrics do not incorporate any ground truth labels of child diagnosis, linear regression using the 3 correlative metrics as input can predict the mean probability of the correct class of each worker with a mean average error of 7.51% for performance on the same set of videos and 10.93% for performance on a distinct balanced video set with different children. These results indicate that crowd workers can be recruited for performance based largely on behavioral metrics on a crowdsourced task, enabling an affordable way to filter crowd workforces into a trustworthy and reliable diagnostic workforce.
View details for PubMedID 33691000
Feature replacement methods enable reliable home video analysis for machine learning detection of autism.
2020; 10 (1): 21245
Autism Spectrum Disorder is a neuropsychiatric condition affecting 53 million children worldwide and for which early diagnosis is critical to the outcome of behavior therapies. Machine learning applied to features manually extracted from readily accessible videos (e.g., from smartphones) has the potential to scale this diagnostic process. However, nearly unavoidable variability in video quality can lead to missing features that degrade algorithm performance. To manage this uncertainty, we evaluated the impact of missing values and feature imputation methods on two previously published autism detection classifiers, trained on standard-of-care instrument scoresheets and tested on ratings of 140 children videos from YouTube. We compare the baseline method of listwise deletion to classic univariate and multivariate techniques. We also introduce a feature replacement method that, based on a score, selects a feature from an expanded dataset to fill-in the missing value. The replacement feature selected can be identical for all records (general) or automatically adjusted to the record considered (dynamic). Our results show that general and dynamic feature replacement methods achieve a higher performance than classic univariate and multivariate methods, supporting the hypothesis that algorithmic management can maintain the fidelity of video-based diagnostics in the face of missing values and variable video quality.
View details for DOI 10.1038/s41598-020-76874-w
View details for PubMedID 33277527
Precision Telemedicine through Crowdsourced Machine Learning: Testing Variability of Crowd Workers for Video-Based Autism Feature Recognition.
Journal of personalized medicine
2020; 10 (3)
Mobilized telemedicine is becoming a key, and even necessary, facet of both precision health and precision medicine. In this study, we evaluate the capability and potential of a crowd of virtual workers-defined as vetted members of popular crowdsourcing platforms-to aid in the task of diagnosing autism. We evaluate workers when crowdsourcing the task of providing categorical ordinal behavioral ratings to unstructured public YouTube videos of children with autism and neurotypical controls. To evaluate emerging patterns that are consistent across independent crowds, we target workers from distinct geographic loci on two crowdsourcing platforms: an international group of workers on Amazon Mechanical Turk (MTurk) (N = 15) and Microworkers from Bangladesh (N = 56), Kenya (N = 23), and the Philippines (N = 25). We feed worker responses as input to a validated diagnostic machine learning classifier trained on clinician-filled electronic health records. We find that regardless of crowd platform or targeted country, workers vary in the average confidence of the correct diagnosis predicted by the classifier. The best worker responses produce a mean probability of the correct class above 80% and over one standard deviation above 50%, accuracy and variability on par with experts according to prior studies. There is a weak correlation between mean time spent on task and mean performance (r = 0.358, p = 0.005). These results demonstrate that while the crowd can produce accurate diagnoses, there are intrinsic differences in crowdworker ability to rate behavioral features. We propose a novel strategy for recruitment of crowdsourced workers to ensure high quality diagnostic evaluations of autism, and potentially many other pediatric behavioral health conditions. Our approach represents a viable step in the direction of crowd-based approaches for more scalable and affordable precision medicine.
View details for DOI 10.3390/jpm10030086
View details for PubMedID 32823538
Game theoretic centrality: a novel approach to prioritize disease candidate genes by combining biological networks with the Shapley value.
2020; 21 (1): 356
BACKGROUND: Complex human health conditions with etiological heterogeneity like Autism Spectrum Disorder (ASD) often pose a challenge for traditional genome-wide association study approaches in defining a clear genotype to phenotype model. Coalitional game theory (CGT) is an exciting method that can consider the combinatorial effect of groups of variants working in concert to produce a phenotype. CGT has been applied to associate likely-gene-disrupting variants encoded from whole genome sequence data to ASD; however, this previous approach cannot take into account for prior biological knowledge. Here we extend CGT to incorporate a priori knowledge from biological networks through a game theoretic centrality measure based on Shapley value to rank genes by their relevance-the individual gene's synergistic influence in a gene-to-gene interaction network. Game theoretic centrality extends the notion of Shapley value to the evaluation of a gene's contribution to the overall connectivity of its corresponding node in a biological network.RESULTS: We implemented and applied game theoretic centrality to rank genes on whole genomes from 756 multiplex autism families. Top ranking genes with the highest game theoretic centrality in both the weighted and unweighted approaches were enriched for pathways previously associated with autism, including pathways of the immune system. Four of the selected genes HLA-A, HLA-B, HLA-G, and HLA-DRB1-have also been implicated in ASD and further support the link between ASD and the human leukocyte antigen complex.CONCLUSIONS: Game theoretic centrality can prioritize influential, disease-associated genes within biological networks, and assist in the decoding of polygenic associations to complex disorders like autism.
View details for DOI 10.1186/s12859-020-03693-1
View details for PubMedID 32787845
The Performance of Emotion Classifiers for Children With Parent-Reported Autism: Quantitative Feasibility Study.
JMIR mental health
2020; 7 (4): e13174
BACKGROUND: Autism spectrum disorder (ASD) is a developmental disorder characterized by deficits in social communication and interaction, and restricted and repetitive behaviors and interests. The incidence of ASD has increased in recent years; it is now estimated that approximately 1 in 40 children in the United States are affected. Due in part to increasing prevalence, access to treatment has become constrained. Hope lies in mobile solutions that provide therapy through artificial intelligence (AI) approaches, including facial and emotion detection AI models developed by mainstream cloud providers, available directly to consumers. However, these solutions may not be sufficiently trained for use in pediatric populations.OBJECTIVE: Emotion classifiers available off-the-shelf to the general public through Microsoft, Amazon, Google, and Sighthound are well-suited to the pediatric population, and could be used for developing mobile therapies targeting aspects of social communication and interaction, perhaps accelerating innovation in this space. This study aimed to test these classifiers directly with image data from children with parent-reported ASD recruited through crowdsourcing.METHODS: We used a mobile game called Guess What? that challenges a child to act out a series of prompts displayed on the screen of the smartphone held on the forehead of his or her care provider. The game is intended to be a fun and engaging way for the child and parent to interact socially, for example, the parent attempting to guess what emotion the child is acting out (eg, surprised, scared, or disgusted). During a 90-second game session, as many as 50 prompts are shown while the child acts, and the video records the actions and expressions of the child. Due in part to the fun nature of the game, it is a viable way to remotely engage pediatric populations, including the autism population through crowdsourcing. We recruited 21 children with ASD to play the game and gathered 2602 emotive frames following their game sessions. These data were used to evaluate the accuracy and performance of four state-of-the-art facial emotion classifiers to develop an understanding of the feasibility of these platforms for pediatric research.RESULTS: All classifiers performed poorly for every evaluated emotion except happy. None of the classifiers correctly labeled over 60.18% (1566/2602) of the evaluated frames. Moreover, none of the classifiers correctly identified more than 11% (6/51) of the angry frames and 14% (10/69) of the disgust frames.CONCLUSIONS: The findings suggest that commercial emotion classifiers may be insufficiently trained for use in digital approaches to autism treatment and treatment tracking. Secure, privacy-preserving methods to increase labeled training data are needed to boost the models' performance before they can be used in AI-enabled approaches to social therapy of the kind that is common in autism treatments.
View details for DOI 10.2196/13174
View details for PubMedID 32234701
Feature Selection and Dimension Reduction of Social Autism Data.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2020; 25: 707–18
Autism Spectrum Disorder (ASD) is a complex neuropsychiatric condition with a highly heterogeneous phenotype. Following the work of Duda et al., which uses a reduced feature set from the Social Responsiveness Scale, Second Edition (SRS) to distinguish ASD from ADHD, we performed item-level question selection on answers to the SRS to determine whether ASD can be distinguished from non-ASD using a similarly small subset of questions. To explore feature redundancies between the SRS questions, we performed filter, wrapper, and embedded feature selection analyses. To explore the linearity of the SRS-related ASD phenotype, we then compressed the 65-question SRS into low-dimension representations using PCA, t-SNE, and a denoising autoencoder. We measured the performance of a multilayer perceptron (MLP) classifier with the top-ranking questions as input. Classification using only the top-rated question resulted in an AUC of over 92% for SRS-derived diagnoses and an AUC of over 83% for dataset-specific diagnoses. High redundancy of features have implications towards replacing the social behaviors that are targeted in behavioral diagnostics and interventions, where digital quantification of certain features may be obfuscated due to privacy concerns. We similarly evaluated the performance of an MLP classifier trained on the low-dimension representations of the SRS, finding that the denoising autoencoder achieved slightly higher performance than the PCA and t-SNE representations.
View details for PubMedID 31797640
A Mobile Game for Automatic Emotion-Labeling of Images.
IEEE transactions on games
2020; 12 (2): 213–18
In this paper, we describe challenges in the development of a mobile charades-style game for delivery of social training to children with Autism Spectrum Disorder (ASD). Providing real-time feedback and adapting game difficulty in response to the child's performance necessitates the integration of emotion classifiers into the system. Due to the limited performance of existing emotion recognition platforms for children with ASD, we propose a novel technique to automatically extract emotion-labeled frames from video acquired from game sessions, which we hypothesize can be used to train new emotion classifiers to overcome these limitations. Our technique, which uses probability scores from three different classifiers and meta information from game sessions, correctly identified 83% of frames compared to a baseline of 51.6% from the best emotion classification API evaluated in our work.
View details for DOI 10.1109/tg.2018.2877325
View details for PubMedID 32551410
View details for PubMedCentralID PMC7301713
Data-Driven Diagnostics and the Potential of Mobile Artificial Intelligence for Digital Therapeutic Phenotyping in Computational Psychiatry.
Biological psychiatry. Cognitive neuroscience and neuroimaging
Data science and digital technologies have the potential to transform diagnostic classification. Digital technologies enable the collection of big data, and advances in machine learning and artificial intelligence enable scalable, rapid, and automated classification of medical conditions. In this review, we summarize and categorize various data-driven methods for diagnostic classification. In particular, we focus on autism as an example of a challenging disorder due to its highly heterogeneous nature. We begin by describing the frontier of data science methods for the neuropsychiatry of autism. We discuss early signs of autism as defined by existing pen-and-paper-based diagnostic instruments and describe data-driven feature selection techniques for determining the behaviors that are most salient for distinguishing children with autism from neurologically typical children. We then describe data-driven detection techniques, particularly computer vision and eye tracking, that provide a means of quantifying behavioral differences between cases and controls. We also describe methods of preserving the privacy of collected videos and prior efforts of incorporating humans in the diagnostic loop. Finally, we summarize existing digital therapeutic interventions that allow for data capture and longitudinal outcome tracking as the diagnosis moves along a positive trajectory. Digital phenotyping of autism is paving the way for quantitative psychiatry more broadly and will set the stage for more scalable, accessible, and precise diagnostic techniques in the field.
View details for DOI 10.1016/j.bpsc.2019.11.015
View details for PubMedID 32085921
MOBILE COMPUTING AND COMMUNICATIONS REVIEW
2019; 23 (2): 35–38
View details for Web of Science ID 000498622500008
Validity of Online Screening for Autism: Crowdsourcing Study Comparing Paid and Unpaid Diagnostic Tasks.
Journal of medical Internet research
2019; 21 (5): e13668
BACKGROUND: Obtaining a diagnosis of neuropsychiatric disorders such as autism requires long waiting times that can exceed a year and can be prohibitively expensive. Crowdsourcing approaches may provide a scalable alternative that can accelerate general access to care and permit underserved populations to obtain an accurate diagnosis.OBJECTIVE: We aimed to perform a series of studies to explore whether paid crowd workers on Amazon Mechanical Turk (AMT) and citizen crowd workers on a public website shared on social media can provide accurate online detection of autism, conducted via crowdsourced ratings of short home video clips.METHODS: Three online studies were performed: (1) a paid crowdsourcing task on AMT (N=54) where crowd workers were asked to classify 10 short video clips of children as "Autism" or "Not autism," (2) a more complex paid crowdsourcing task (N=27) with only those raters who correctly rated ≥8 of the 10 videos during the first study, and (3) a public unpaid study (N=115) identical to the first study.RESULTS: For Study 1, the mean score of the participants who completed all questions was 7.50/10 (SD 1.46). When only analyzing the workers who scored ≥8/10 (n=27/54), there was a weak negative correlation between the time spent rating the videos and the sensitivity (rho=-0.44, P=.02). For Study 2, the mean score of the participants rating new videos was 6.76/10 (SD 0.59). The average deviation between the crowdsourced answers and gold standard ratings provided by two expert clinical research coordinators was 0.56, with an SD of 0.51 (maximum possible SD is 3). All paid crowd workers who scored 8/10 in Study 1 either expressed enjoyment in performing the task in Study 2 or provided no negative comments. For Study 3, the mean score of the participants who completed all questions was 6.67/10 (SD 1.61). There were weak correlations between age and score (r=0.22, P=.014), age and sensitivity (r=-0.19, P=.04), number of family members with autism and sensitivity (r=-0.195, P=.04), and number of family members with autism and precision (r=-0.203, P=.03). A two-tailed t test between the scores of the paid workers in Study 1 and the unpaid workers in Study 3 showed a significant difference (P<.001).CONCLUSIONS: Many paid crowd workers on AMT enjoyed answering screening questions from videos, suggesting higher intrinsic motivation to make quality assessments. Paid crowdsourcing provides promising screening assessments of pediatric autism with an average deviation <20% from professional gold standard raters, which is potentially a clinically informative estimate for parents. Parents of children with autism likely overfit their intuition to their own affected child. This work provides preliminary demographic data on raters who may have higher ability to recognize and measure features of autism across its wide range of phenotypic manifestations.
View details for DOI 10.2196/13668
View details for PubMedID 31124463
- Effect of Wearable Digital Intervention for Improving Socialization in Children With Autism Spectrum Disorder A Randomized Clinical Trial JAMA PEDIATRICS 2019; 173 (5): 446–54
- Detecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study JOURNAL OF MEDICAL INTERNET RESEARCH 2019; 21 (4)
Effect of Wearable Digital Intervention for Improving Socialization in Children With Autism Spectrum Disorder: A Randomized Clinical Trial.
Importance: Autism behavioral therapy is effective but expensive and difficult to access. While mobile technology-based therapy can alleviate wait-lists and scale for increasing demand, few clinical trials exist to support its use for autism spectrum disorder (ASD) care.Objective: To evaluate the efficacy of Superpower Glass, an artificial intelligence-driven wearable behavioral intervention for improving social outcomes of children with ASD.Design, Setting, and Participants: A randomized clinical trial in which participants received the Superpower Glass intervention plus standard of care applied behavioral analysis therapy and control participants received only applied behavioral analysis therapy. Assessments were completed at the Stanford University Medical School, and enrolled participants used the Superpower Glass intervention in their homes. Children aged 6 to 12 years with a formal ASD diagnosis who were currently receiving applied behavioral analysis therapy were included. Families were recruited between June 2016 and December 2017. The first participant was enrolled on November 1, 2016, and the last appointment was completed on April 11, 2018. Data analysis was conducted between April and October 2018.Interventions: The Superpower Glass intervention, deployed via Google Glass (worn by the child) and a smartphone app, promotes facial engagement and emotion recognition by detecting facial expressions and providing reinforcing social cues. Families were asked to conduct 20-minute sessions at home 4 times per week for 6 weeks.Main Outcomes and Measures: Four socialization measures were assessed using an intention-to-treat analysis with a Bonferroni test correction.Results: Overall, 71 children (63 boys [89%]; mean [SD] age, 8.38 [2.46] years) diagnosed with ASD were enrolled (40 [56.3%] were randomized to treatment, and 31 (43.7%) were randomized to control). Children receiving the intervention showed significant improvements on the Vineland Adaptive Behaviors Scale socialization subscale compared with treatment as usual controls (mean [SD] treatment impact, 4.58 [1.62]; P=.005). Positive mean treatment effects were also found for the other 3 primary measures but not to a significance threshold of P=.0125.Conclusions and Relevance: The observed 4.58-point average gain on the Vineland Adaptive Behaviors Scale socialization subscale is comparable with gains observed with standard of care therapy. To our knowledge, this is the first randomized clinical trial to demonstrate efficacy of a wearable digital intervention to improve social behavior of children with ASD. The intervention reinforces facial engagement and emotion recognition, suggesting either or both could be a mechanism of action driving the observed improvement. This study underscores the potential of digital home therapy to augment the standard of care.Trial Registration: ClinicalTrials.gov identifier: NCT03569176.
View details for PubMedID 30907929
Interactive programming paradigm for real-time experimentation with remote living matter.
Proceedings of the National Academy of Sciences of the United States of America
View details for PubMedID 30824592
Identification and Quantification of Gaps in Access to Autism Resources in the United States: An Infodemiological Study.
Journal of medical Internet research
2019; 21 (7): e13094
Autism affects 1 in every 59 children in the United States, according to estimates from the Centers for Disease Control and Prevention's Autism and Developmental Disabilities Monitoring Network in 2018. Although similar rates of autism are reported in rural and urban areas, rural families report greater difficulty in accessing resources. An overwhelming number of families experience long waitlists for diagnostic and therapeutic services.The objective of this study was to accurately identify gaps in access to autism care using GapMap, a mobile platform that connects families with local resources while continuously collecting up-to-date autism resource epidemiological information.After being extracted from various databases, resources were deduplicated, validated, and allocated into 7 categories based on the keywords identified on the resource website. The average distance between the individuals from a simulated autism population and the nearest autism resource in our database was calculated for each US county. Resource load, an approximation of demand over supply for diagnostic resources, was calculated for each US county.There are approximately 28,000 US resources validated on the GapMap database, each allocated into 1 or more of the 7 categories. States with the greatest distances to autism resources included Alaska, Nevada, Wyoming, Montana, and Arizona. Of the 7 resource categories, diagnostic resources were the most underrepresented, comprising only 8.83% (2472/28,003) of all resources. Alarmingly, 83.86% (2635/3142) of all US counties lacked any diagnostic resources. States with the highest diagnostic resource load included West Virginia, Kentucky, Maine, Mississippi, and New Mexico.Results from this study demonstrate the sparsity and uneven distribution of diagnostic resources in the United States, which may contribute to the lengthy waitlists and travel distances-barriers to be overcome to be able to receive diagnosis in specific regions. More data are needed on autism diagnosis demand to better quantify resource needs across the United States.
View details for DOI 10.2196/13094
View details for PubMedID 31293243
Scientific Discovery Games for Biomedical Research.
Annual review of biomedical data science
2019; 2 (1): 253-279
Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.
View details for DOI 10.1146/annurev-biodatasci-072018-021139
View details for PubMedID 34308269
View details for PubMedCentralID PMC8297398
- Scientific Discovery Games for Biomedical Research ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 2, 2019 2019; 2: 253–79
Labeling images with facial emotion and the potential for pediatric healthcare.
Artificial intelligence in medicine
2019; 98: 77–86
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by repetitive behaviors, narrow interests, and deficits in social interaction and communication ability. An increasing emphasis is being placed on the development of innovative digital and mobile systems for their potential in therapeutic applications outside of clinical environments. Due to recent advances in the field of computer vision, various emotion classifiers have been developed, which have potential to play a significant role in mobile screening and therapy for developmental delays that impair emotion recognition and expression. However, these classifiers are trained on datasets of predominantly neurotypical adults and can sometimes fail to generalize to children with autism. The need to improve existing classifiers and develop new systems that overcome these limitations necessitates novel methods to crowdsource labeled emotion data from children. In this paper, we present a mobile charades-style game, Guess What?, from which we derive egocentric video with a high density of varied emotion from a 90-second game session. We then present a framework for semi-automatic labeled frame extraction from these videos using meta information from the game session coupled with classification confidence scores. Results show that 94%, 81%, 92%, and 56% of frames were automatically labeled correctly for categories disgust, neutral, surprise, and scared respectively, though performance for angry and happy did not improve significantly from the baseline.
View details for DOI 10.1016/j.artmed.2019.06.004
View details for PubMedID 31521254
Guess What?: Towards Understanding Autism from Structured Video Using Facial Affect.
Journal of healthcare informatics research
2019; 3: 43–66
Autism Spectrum Disorder (ASD) is a condition affecting an estimated 1 in 59 children in the United States. Due to delays in diagnosis and imbalances in coverage, it is necessary to develop new methods of care delivery that can appropriately empower children and caregivers by capitalizing on mobile tools and wearable devices for use outside of clinical settings. In this paper, we present a mobile charades-style game, Guess What?, used for the acquisition of structured video from children with ASD for behavioral disease research. We then apply face tracking and emotion recognition algorithms to videos acquired through Guess What? game play. By analyzing facial affect in response to various prompts, we demonstrate that engagement and facial affect can be quantified and measured using real-time image processing algorithms: an important first-step for future therapies, at-home screenings, and outcome measures based on home video. Our study of eight subjects demonstrates the efficacy of this system for deriving highly emotive structured video from children with ASD through an engaging gamified mobile platform, while revealing the most efficacious prompts and categories for producing diverse emotion in participants.
View details for DOI 10.1007/s41666-018-0034-9
View details for PubMedID 33313475
Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
WORLD SCIENTIFIC PUBL CO PTE LTD. 2019: 260–71
View details for Web of Science ID 000461866400024
Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2019; 24: 260–71
Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an ℓ1-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders.
View details for PubMedID 30864328
Detecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study.
Journal of medical Internet research
2019; 21 (4): e13822
Autism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers to accelerate the process by collecting home videos of US-based children, identifying a reduced subset of behavioral features that are scored by untrained raters using a machine learning classifier to determine children's "risk scores" for autism. We achieved an accuracy of 92% (95% CI 88%-97%) on US videos using a classifier built on five features.Using videos of Bangladeshi children collected from Dhaka Shishu Children's Hospital, we aim to scale our pipeline to another culture and other developmental delays, including speech and language conditions.Although our previously published and validated pipeline and set of classifiers perform reasonably well on Bangladeshi videos (75% accuracy, 95% CI 71%-78%), this work improves on that accuracy through the development and application of a powerful new technique for adaptive aggregation of crowdsourced labels. We enhance both the utility and performance of our model by building two classification layers: The first layer distinguishes between typical and atypical behavior, and the second layer distinguishes between ASD and non-ASD. In each of the layers, we use a unique rater weighting scheme to aggregate classification scores from different raters based on their expertise. We also determine Shapley values for the most important features in the classifier to understand how the classifiers' process aligns with clinical intuition.Using these techniques, we achieved an accuracy (area under the curve [AUC]) of 76% (SD 3%) and sensitivity of 76% (SD 4%) for identifying atypical children from among developmentally delayed children, and an accuracy (AUC) of 85% (SD 5%) and sensitivity of 76% (SD 6%) for identifying children with ASD from those predicted to have other developmental delays.These results show promise for using a mobile video-based and machine learning-directed approach for early and remote detection of autism in Bangladeshi children. This strategy could provide important resources for developmental health in developing countries with few clinical resources for diagnosis, helping children get access to care at an early age. Future research aimed at extending the application of this approach to identify a range of other conditions and determine the population-level burden of developmental disabilities and impairments will be of high value.
View details for PubMedID 31017583
- Addendum to the Acknowledgements: Validity of Online Screening for Autism: Crowdsourcing Study Comparing Paid and Unpaid Diagnostic Tasks. Journal of medical Internet research 2019; 21 (6): e14950
Mobile detection of autism through machine learning on home video: A development and prospective validation study.
2018; 15 (11): e1002705
BACKGROUND: The standard approaches to diagnosing autism spectrum disorder (ASD) evaluate between 20 and 100 behaviors and take several hours to complete. This has in part contributed to long wait times for a diagnosis and subsequent delays in access to therapy. We hypothesize that the use of machine learning analysis on home video can speed the diagnosis without compromising accuracy. We have analyzed item-level records from 2 standard diagnostic instruments to construct machine learning classifiers optimized for sparsity, interpretability, and accuracy. In the present study, we prospectively test whether the features from these optimized models can be extracted by blinded nonexpert raters from 3-minute home videos of children with and without ASD to arrive at a rapid and accurate machine learning autism classification.METHODS AND FINDINGS: We created a mobile web portal for video raters to assess 30 behavioral features (e.g., eye contact, social smile) that are used by 8 independent machine learning models for identifying ASD, each with >94% accuracy in cross-validation testing and subsequent independent validation from previous work. We then collected 116 short home videos of children with autism (mean age = 4 years 10 months, SD = 2 years 3 months) and 46 videos of typically developing children (mean age = 2 years 11 months, SD = 1 year 2 months). Three raters blind to the diagnosis independently measured each of the 30 features from the 8 models, with a median time to completion of 4 minutes. Although several models (consisting of alternating decision trees, support vector machine [SVM], logistic regression (LR), radial kernel, and linear SVM) performed well, a sparse 5-feature LR classifier (LR5) yielded the highest accuracy (area under the curve [AUC]: 92% [95% CI 88%-97%]) across all ages tested. We used a prospectively collected independent validation set of 66 videos (33 ASD and 33 non-ASD) and 3 independent rater measurements to validate the outcome, achieving lower but comparable accuracy (AUC: 89% [95% CI 81%-95%]). Finally, we applied LR to the 162-video-feature matrix to construct an 8-feature model, which achieved 0.93 AUC (95% CI 0.90-0.97) on the held-out test set and 0.86 on the validation set of 66 videos. Validation on children with an existing diagnosis limited the ability to generalize the performance to undiagnosed populations.CONCLUSIONS: These results support the hypothesis that feature tagging of home videos for machine learning classification of autism can yield accurate outcomes in short time frames, using mobile devices. Further work will be needed to confirm that this approach can accelerate autism diagnosis at scale.
View details for PubMedID 30481180
- Exploratory study examining the at-home feasibility of a wearable tool for social-affective learning in children with autism NPJ DIGITAL MEDICINE 2018; 1
A Programming Toolkit for Automating Biophysics Experiments with Microorganism Swarms
CELL PRESS. 2018: 183A
View details for Web of Science ID 000430439600169
Feasibility Testing of a Wearable Behavioral Aid for Social Learning in Children with Autism
APPLIED CLINICAL INFORMATICS
2018; 9 (1): 129–40
Recent advances in computer vision and wearable technology have created an opportunity to introduce mobile therapy systems for autism spectrum disorders (ASD) that can respond to the increasing demand for therapeutic interventions; however, feasibility questions must be answered first.We studied the feasibility of a prototype therapeutic tool for children with ASD using Google Glass, examining whether children with ASD would wear such a device, if providing the emotion classification will improve emotion recognition, and how emotion recognition differs between ASD participants and neurotypical controls (NC).We ran a controlled laboratory experiment with 43 children: 23 with ASD and 20 NC. Children identified static facial images on a computer screen with one of 7 emotions in 3 successive batches: the first with no information about emotion provided to the child, the second with the correct classification from the Glass labeling the emotion, and the third again without emotion information. We then trained a logistic regression classifier on the emotion confusion matrices generated by the two information-free batches to predict ASD versus NC.All 43 children were comfortable wearing the Glass. ASD and NC participants who completed the computer task with Glass providing audible emotion labeling (n = 33) showed increased accuracies in emotion labeling, and the logistic regression classifier achieved an accuracy of 72.7%. Further analysis suggests that the ability to recognize surprise, fear, and neutrality may distinguish ASD cases from NC.This feasibility study supports the utility of a wearable device for social affective learning in ASD children and demonstrates subtle differences in how ASD and NC children perform on an emotion recognition task.
View details for DOI 10.1055/s-0038-1626727
View details for Web of Science ID 000428690000006
View details for PubMedID 29466819
View details for PubMedCentralID PMC5821509
Analysis of Sex and Recurrence Ratios in Simplex and Multiplex Autism Spectrum Disorder Implicates Sex-Specific Alleles as Inheritance Mechanism
IEEE. 2018: 1470–77
View details for Web of Science ID 000458654000258
Exploratory study examining the at-home feasibility of a wearable tool for social-affective learning in children with autism.
NPJ digital medicine
2018; 1: 32
Although standard behavioral interventions for autism spectrum disorder (ASD) are effective therapies for social deficits, they face criticism for being time-intensive and overdependent on specialists. Earlier starting age of therapy is a strong predictor of later success, but waitlists for therapies can be 18 months long. To address these complications, we developed Superpower Glass, a machine-learning-assisted software system that runs on Google Glass and an Android smartphone, designed for use during social interactions. This pilot exploratory study examines our prototype tool's potential for social-affective learning for children with autism. We sent our tool home with 14 families and assessed changes from intake to conclusion through the Social Responsiveness Scale (SRS-2), a facial affect recognition task (EGG), and qualitative parent reports. A repeated-measures one-way ANOVA demonstrated a decrease in SRS-2 total scores by an average 7.14 points (F(1,13) = 33.20, p = <.001, higher scores indicate higher ASD severity). EGG scores also increased by an average 9.55 correct responses (F(1,10) = 11.89, p = <.01). Parents reported increased eye contact and greater social acuity. This feasibility study supports using mobile technologies for potential therapeutic purposes.
View details for DOI 10.1038/s41746-018-0035-3
View details for PubMedID 31304314
View details for PubMedCentralID PMC6550272
- SuperpowerGlass: A Wearable Aid for the At-Home Therapy of Children with Autism Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2017
- Bioty: A cloud-based development toolkit for programming experiments and interactive applications with living cells bioRxiv. 2017
- Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems 2016
- Human Perception of Swarm Robot Motion 2016
- ScaleMed: A methodology for iterative mHealth clinical trials 17th International Conference on E-health Networking, Application & Services (HealthCom) 2015
- Rethinking the Imaging Pipeline for Energy‐Efficient Privacy‐Preserving Continuous Mobile Vision SID Symposium Digest of Technical Papers. 2015
- The wireless data drain of users, apps, & platforms ACM SIGMOBILE Mobile Computing and Communications Review 2013; 17 (4)