Machine learning models using mobile game play accurately classify children with autism.
Digitally-delivered healthcare is well suited to address current inequities in the delivery of care due to barriers of access to healthcare facilities. As the COVID-19 pandemic phases out, we have a unique opportunity to capitalize on the current familiarity with telemedicine approaches and continue to advocate for mainstream adoption of remote care delivery. In this paper, we specifically focus on the ability of GuessWhat? a smartphone-based charades-style gamified therapeutic intervention for autism spectrum disorder (ASD) to generate a signal that distinguishes children with ASD from neurotypical (NT) children. We demonstrate the feasibility of using "in-the-wild", naturalistic gameplay data to distinguish between ASD and NT by children by training a random forest classifier to discern the two classes (AU-ROC = 0.745, recall = 0.769). This performance demonstrates the potential for GuessWhat? to facilitate screening for ASD in historically difficult-to-reach communities. To further examine this potential, future work should expand the size of the training sample and interrogate differences in predictive ability by demographic.
View details for DOI 10.1016/j.ibmed.2022.100057
View details for PubMedID 36035501
Training and Profiling a Pediatric Facial Expression Classifier for Children on Mobile Devices: Machine Learning Study.
JMIR formative research
BACKGROUND: Implementing automated facial expression recognition on mobile devices could provide an accessible diagnostic and therapeutic tool for those who struggle to recognize facial expression, including children with developmental behavioral conditions such as autism. Although recent advances have been made in building more accurate facial expression classifiers for children, existing models are too computationally expensive to be deployed on smartphones.OBJECTIVE: In this study, we explored the deployment of several state-of-the-art facial expression classifiers designed for usage on mobile devices. We use various post-training optimization techniques for both classification performance and efficiency on a Motorola Moto G6 phone. We additionally explore the importance of training our classifiers on children compared to adults and evaluate the performance of our models against different ethnic groups.METHODS: We collected images from twelve public datasets and used video frames crowdsourced from the GuessWhat app a to train our classifiers. All images were annotated for 7 expressions: neutral, fear, happiness, sadness, surprise, anger, and disgust. We tested three copies for each of five different convolutional neural network architectures: MobileNetV3-Small 1.0x, MobileNetV2 1.0x, EfficientNetB0, MobileNetV3-Large 1.0x, and NASNetMobile. The first copy trained on images of children, the second copy trained on images of adults, while the third copy trained on all datasets. We evaluated each model against the Child Affective Facial Expression set, both in its entirety and by ethnicity. We then performed weight pruning, weight clustering, and quantize-aware training when possible and profiled the performance of each model on the Moto G6.RESULTS: Our best model, a MobileNetV3-Large network pre-trained on ImageNet, achieved 65.78% balanced accuracy and 65.31% F1-score on CAFE while achieving a 90-millisecond inference latency on a Motorola Moto G6 phone when trained on all data. This balanced accuracy is only 1.12% lower than the current state of the art for CAFE, a model with 13.91x more parameters and was unable to run on the Moto G6 due to its size, even when fully optimized. When trained solely on children, this model achieved 60.57% balanced accuracy and 60.29% F1-score, while when trained only on adults the model received 53.36% balanced accuracy and 53.10% F1-score. Although the MobileNetV3-Large trained on all datasets achieved nearly 60% F1-score across all ethnicities, South Asian and African American children receive as much as 11.56% balanced accuracy and 11.25% F1-score lower than other groups.CONCLUSIONS: This work demonstrates that with specialized design and optimization techniques, facial expression classifiers can become lightweight enough to run on mobile devices and still achieve state-of-the-art performance. This study also shows that there is potentially a "data shift" phenomenon between facial expressions of children compared to adults, with our classifiers performing much better when trained on children. In addition, we find that certain underrepresented ethnic groups such as South Asian and African American perform significantly worse than groups such as European Caucasian despite having a similar quality of data. The models developed in this study can be integrated into mobile health therapies to help diagnose ASD and to provide targeted therapeutic treatment to children.CLINICALTRIAL:
View details for DOI 10.2196/39917
View details for PubMedID 35962462
Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study.
JMIR pediatrics and parenting
2022; 5 (2): e26760
Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces.We designed a strategy to gamify the collection and labeling of child emotion-enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches.We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion-centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children.The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining "anger" and "disgust" into a single class.This work validates that mobile games designed for pediatric therapies can generate high volumes of domain-relevant data sets to train state-of-the-art classifiers to perform tasks helpful to precision health efforts.
View details for DOI 10.2196/26760
View details for PubMedID 35394438
TikTok for good: Creating a diverse emotion expression database
IEEE. 2022: 2495-2505
View details for DOI 10.1109/CVPRW56347.2022.00279
View details for Web of Science ID 000861612702072
Activity Recognition with Moving Cameras and Few Training Examples: Applications for Detection of Autism-Related Headbanging
ASSOC COMPUTING MACHINERY. 2021
View details for DOI 10.1145/3411763.3451701
View details for Web of Science ID 000759178502011