All Publications

  • Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma. PLoS genetics Tanigawa, Y., Wainberg, M., Karjalainen, J., Kiiskinen, T., Venkataraman, G., Lemmela, S., Turunen, J. A., Graham, R. R., Havulinna, A. S., Perola, M., Palotie, A., FinnGen, Daly, M. J., Rivas, M. A. 2020; 16 (5): e1008682


    Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through rare protein-altering variant association analysis, we find a missense variant in ANGPTL7 in UK Biobank (rs28991009, p.Gln175His, MAF = 0.8%, genotyped in 82,253 individuals with measured IOP and an independent set of 4,238 glaucoma patients and 250,660 controls) that significantly lowers IOP (beta = -0.53 and -0.67 mmHg for heterozygotes, -3.40 and -2.37 mmHg for homozygotes, P = 5.96 x 10-9 and 1.07 x 10-13 for corneal compensated and Goldman-correlated IOP, respectively) and is associated with 34% reduced risk of glaucoma (P = 0.0062). In FinnGen, we identify an ANGPTL7 missense variant at a greater than 50-fold increased frequency in Finland compared with other populations (rs147660927, p.Arg220Cys, MAF Finland = 4.3%), which was genotyped in 6,537 glaucoma patients and 170,362 controls and is associated with a 29% lower glaucoma risk (P = 1.9 x 10-12 for all glaucoma types and also protection against its subtypes including exfoliation, primary open-angle, and primary angle-closure). We further find three rarer variants in UK Biobank, including a protein-truncating variant, which confer a strong composite lowering of IOP (P = 0.0012 and 0.24 for Goldman-correlated and corneal compensated IOP, respectively), suggesting the protective mechanism likely resides in the loss of interaction or function. Our results support inhibition or down-regulation of ANGPTL7 as a therapeutic strategy for glaucoma.

    View details for DOI 10.1371/journal.pgen.1008682

    View details for PubMedID 32369491

  • Automated Classification of Radiographic Knee Osteoarthritis Severity Using Deep Neural Networks. Radiology. Artificial intelligence Thomas, K. A., Kidzinski, L., Halilaj, E., Fleming, S. L., Venkataraman, G. R., Oei, E. H., Gold, G. E., Delp, S. L. 2020; 2 (2): e190065


    Purpose: To develop an automated model for staging knee osteoarthritis severity from radiographs and to compare its performance to that of musculoskeletal radiologists.Materials and Methods: Radiographs from the Osteoarthritis Initiative staged by a radiologist committee using the Kellgren-Lawrence (KL) system were used. Before using the images as input to a convolutional neural network model, they were standardized and augmented automatically. The model was trained with 32116 images, tuned with 4074 images, evaluated with a 4090-image test set, and compared to two individual radiologists using a 50-image test subset. Saliency maps were generated to reveal features used by the model to determine KL grades.Results: With committee scores used as ground truth, the model had an average F1 score of 0.70 and an accuracy of 0.71 for the full test set. For the 50-image subset, the best individual radiologist had an average F1 score of 0.60 and an accuracy of 0.60; the model had an average F1 score of 0.64 and an accuracy of 0.66. Cohen weighted kappa between the committee and model was 0.86, comparable to intraexpert repeatability. Saliency maps identified sites of osteophyte formation as influential to predictions.Conclusion: An end-to-end interpretable model that takes full radiographs as input and predicts KL scores with state-of-the-art accuracy, performs as well as musculoskeletal radiologists, and does not require manual image preprocessing was developed. Saliency maps suggest the model's predictions were based on clinically relevant information. Supplemental material is available for this article. © RSNA, 2020.

    View details for DOI 10.1148/ryai.2020190065

    View details for PubMedID 32280948

  • FasTag: Automatic text classification of unstructured medical narratives. PloS one Venkataraman, G. R., Pineda, A. L., Bear Don't Walk Iv, O. J., Zehnder, A. M., Ayyar, S., Page, R. L., Bustamante, C. D., Rivas, M. A. 2020; 15 (6): e0234647


    Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.

    View details for DOI 10.1371/journal.pone.0234647

    View details for PubMedID 32569327

  • Rare and common variant discovery in complex disease: the IBD case study. Human molecular genetics Venkataraman, G. R., Rivas, M. A. 2019


    Complex diseases such as inflammatory bowel disease (IBD), which consists of ulcerative colitis and Crohn's disease, are a significant medical burden - 70,000 new cases of IBD are diagnosed in the United States annually. In this Review, we examine the history of genetic variant discovery in complex disease with a focus on IBD. We cover methods that have been applied to microsatellite, common variant, targeted resequencing, and whole-exome and -genome data, specifically focusing on the progression of technologies towards rare-variant discovery. The inception of these methods combined with better availability of population level variation data has led to rapid discovery of IBD-causative and/or -associated variants at over 200 loci; over time, these methods have grown exponentially in both power and ascertainment to detect rare variation. We highlight rare-variant discoveries critical to the elucidation of the pathogenesis of IBD, including those in NOD2, IL23R, CARD9, RNF186, and ADCY7. We additionally identify the major areas of rare-variant discovery that will evolve in the coming years. A better understanding of the genetic basis of IBD and other complex diseases will lead to improved diagnosis, prognosis, treatment, and surveillance.

    View details for DOI 10.1093/hmg/ddz189

    View details for PubMedID 31363759

  • DE NOVO MUTATIONS IN AUTISM IMPLICATE THE SYNAPTIC ELIMINATION NETWORK. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Ram Venkataraman, G., O'Connell, C., Egawa, F., Kashef-Haghighi, D., Wall, D. P. 2016; 22: 521-532


    Autism has been shown to have a major genetic risk component; the architecture of documented autism in families has been over and again shown to be passed down for generations. While inherited risk plays an important role in the autistic nature of children, de novo (germline) mutations have also been implicated in autism risk. Here we find that autism de novo variants verified and published in the literature are Bonferroni-significantly enriched in a gene set implicated in synaptic elimination. Additionally, several of the genes in this synaptic elimination set that were enriched in protein-protein interactions (CACNA1C, SHANK2, SYNGAP1, NLGN3, NRXN1, and PTEN) have been previously confirmed as genes that confer risk for the disorder. The results demonstrate that autism-associated de novos are linked to proper synaptic pruning and density, hinting at the etiology of autism and suggesting pathophysiology for downstream correction and treatment.

    View details for PubMedID 27897003