Stanford Advisors


Lab Affiliations


All Publications


  • Computer Vision Foundation Models in Endoscopy: Proof of Concept in Oropharyngeal Cancer. The Laryngoscope Paderno, A., Rau, A., Bedi, N., Bossi, P., Mercante, G., Piazza, C., Holsinger, F. C. 2024

    Abstract

    To evaluate the performance of vision transformer-derived image embeddings for distinguishing between normal and neoplastic tissues in the oropharynx and to investigate the potential of computer vision (CV) foundation models in medical imaging.Computational study using endoscopic frames with a focus on the application of a self-supervised vision transformer model (DINOv2) for tissue classification. High-definition endoscopic images were used to extract image patches that were then normalized and processed using the DINOv2 model to obtain embeddings. These embeddings served as input for a standard support vector machine (SVM) to classify the tissues as neoplastic or normal. The model's discriminative performance was validated using an 80-20 train-validation split.From 38 endoscopic NBI videos, 327 image patches were analyzed. The classification results in the validation cohort demonstrated high accuracy (92%) and precision (89%), with a perfect recall (100%) and an F1-score of 94%. The receiver operating characteristic (ROC) curve yielded an area under the curve (AUC) of 0.96.The use of large vision model-derived embeddings effectively differentiated between neoplastic and normal oropharyngeal tissues. This study supports the feasibility of employing CV foundation models like DINOv2 in the endoscopic evaluation of mucosal lesions, potentially augmenting diagnostic precision in Otorhinolaryngology.4 Laryngoscope, 2024.

    View details for DOI 10.1002/lary.31534

    View details for PubMedID 38850247

  • SimCol3D - 3D reconstruction during colonoscopy challenge. Medical image analysis Rau, A., Bano, S., Jin, Y., Azagra, P., Morlana, J., Kader, R., Sanderson, E., Matuszewski, B. J., Lee, J. Y., Lee, D. J., Posner, E., Frank, N., Elangovan, V., Raviteja, S., Li, Z., Liu, J., Lalithkumar, S., Islam, M., Ren, H., Lovat, L. B., Montiel, J. M., Stoyanov, D. 2024; 96: 103195

    Abstract

    Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains difficult. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. Establishing a benchmark dataset, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction from synthetic colonoscopy images is robustly solvable, while pose estimation remains an open research question.

    View details for DOI 10.1016/j.media.2024.103195

    View details for PubMedID 38815359