Polygenic risk modeling with latent trait-related genetic components.
European journal of human genetics : EJHG
Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk based on components from Decomposition of Genetic Associations (DeGAs), which we call the DeGAs polygenic risk score (dPRS). We compute DeGAs using genetic associations for 977 traits and find that dPRS performs comparably to standard PRS while offering greater interpretability. We show how to decompose an individual's genetic risk for a trait across DeGAs components, with examples for body mass index (BMI) and myocardial infarction (heart attack) in 337,151 white British individuals in the UK Biobank, with replication in a further set of 25,486 non-British white individuals. We find that BMI polygenic risk factorizes into components related to fat-free mass, fat mass, and overall health indicators like physical activity. Most individuals with high dPRS for BMI have strong contributions from both a fat-mass component and a fat-free mass component, whereas a few "outlier" individuals have strong contributions from only one of the two components. Overall, our method enables fine-scale interpretation of the drivers of genetic risk for complex traits.
View details for DOI 10.1038/s41431-021-00813-0
View details for PubMedID 33558700
Genetics of 35 blood and urine biomarkers in the UK Biobank.
Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n=363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n=135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.
View details for DOI 10.1038/s41588-020-00757-z
View details for PubMedID 33462484
Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma.
2020; 16 (5): e1008682
Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through rare protein-altering variant association analysis, we find a missense variant in ANGPTL7 in UK Biobank (rs28991009, p.Gln175His, MAF = 0.8%, genotyped in 82,253 individuals with measured IOP and an independent set of 4,238 glaucoma patients and 250,660 controls) that significantly lowers IOP (beta = -0.53 and -0.67 mmHg for heterozygotes, -3.40 and -2.37 mmHg for homozygotes, P = 5.96 x 10-9 and 1.07 x 10-13 for corneal compensated and Goldman-correlated IOP, respectively) and is associated with 34% reduced risk of glaucoma (P = 0.0062). In FinnGen, we identify an ANGPTL7 missense variant at a greater than 50-fold increased frequency in Finland compared with other populations (rs147660927, p.Arg220Cys, MAF Finland = 4.3%), which was genotyped in 6,537 glaucoma patients and 170,362 controls and is associated with a 29% lower glaucoma risk (P = 1.9 x 10-12 for all glaucoma types and also protection against its subtypes including exfoliation, primary open-angle, and primary angle-closure). We further find three rarer variants in UK Biobank, including a protein-truncating variant, which confer a strong composite lowering of IOP (P = 0.0012 and 0.24 for Goldman-correlated and corneal compensated IOP, respectively), suggesting the protective mechanism likely resides in the loss of interaction or function. Our results support inhibition or down-regulation of ANGPTL7 as a therapeutic strategy for glaucoma.
View details for DOI 10.1371/journal.pgen.1008682
View details for PubMedID 32369491
Automated Classification of Radiographic Knee Osteoarthritis Severity Using Deep Neural Networks.
Radiology. Artificial intelligence
2020; 2 (2): e190065
Purpose: To develop an automated model for staging knee osteoarthritis severity from radiographs and to compare its performance to that of musculoskeletal radiologists.Materials and Methods: Radiographs from the Osteoarthritis Initiative staged by a radiologist committee using the Kellgren-Lawrence (KL) system were used. Before using the images as input to a convolutional neural network model, they were standardized and augmented automatically. The model was trained with 32116 images, tuned with 4074 images, evaluated with a 4090-image test set, and compared to two individual radiologists using a 50-image test subset. Saliency maps were generated to reveal features used by the model to determine KL grades.Results: With committee scores used as ground truth, the model had an average F1 score of 0.70 and an accuracy of 0.71 for the full test set. For the 50-image subset, the best individual radiologist had an average F1 score of 0.60 and an accuracy of 0.60; the model had an average F1 score of 0.64 and an accuracy of 0.66. Cohen weighted kappa between the committee and model was 0.86, comparable to intraexpert repeatability. Saliency maps identified sites of osteophyte formation as influential to predictions.Conclusion: An end-to-end interpretable model that takes full radiographs as input and predicts KL scores with state-of-the-art accuracy, performs as well as musculoskeletal radiologists, and does not require manual image preprocessing was developed. Saliency maps suggest the model's predictions were based on clinically relevant information. Supplemental material is available for this article. © RSNA, 2020.
View details for DOI 10.1148/ryai.2020190065
View details for PubMedID 32280948
FasTag: Automatic text classification of unstructured medical narratives.
2020; 15 (6): e0234647
Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.
View details for DOI 10.1371/journal.pone.0234647
View details for PubMedID 32569327
Cardiac Imaging of Aortic Valve Area from 34,287 UK Biobank Participants Reveal Novel Genetic Associations and Shared Genetic Comorbidity with Multiple Disease Phenotypes.
Circulation. Genomic and precision medicine
Background - The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods - From a sample of 34,287 white British-ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac MRI sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by polygenic risk scoring and phenome-wide screening to identify genetic comorbidities. Results - A genome-wide association study of aortic valve area in these UK Biobank participants showed three significant associations, indexed by rs71190365 (chr13:50764607, DLEU1, p=1.8×10-9), rs35991305 (chr12:94191968, CRADD, p=3.4×10-8) and chr17:45013271:C:T (GOSR2, p=5.6×10-8). Replication on an independent set of 8,145 unrelated European-ancestry participants showed consistent effect sizes in all three loci, although rs35991305 did not meet nominal significance. We constructed a polygenic risk score for aortic valve area, which in a separate cohort of 311,728 individuals without imaging demonstrated that smaller aortic valve area is predictive of increased risk for aortic valve disease (Odds Ratio 1.14, p=2.3×10-6). After excluding subjects with a medical diagnosis of aortic valve stenosis (remaining n=308,683 individuals), phenome-wide association of >10,000 traits showed multiple links between the polygenic score for aortic valve disease and key health-related comorbidities involving the cardiovascular system and autoimmune disease. Genetic correlation analysis supports a shared genetic etiology with between aortic valve area and birthweight along with other cardiovascular conditions. Conclusions - These results illustrate the use of automated phenotyping of cardiac imaging data from the general population to investigate the genetic etiology of aortic valve disease, perform clinical prediction, and uncover new clinical and genetic correlates of cardiac anatomy.
View details for DOI 10.1161/CIRCGEN.120.003014
View details for PubMedID 33125279
Rare and common variant discovery in complex disease: the IBD case study.
Human molecular genetics
Complex diseases such as inflammatory bowel disease (IBD), which consists of ulcerative colitis and Crohn's disease, are a significant medical burden - 70,000 new cases of IBD are diagnosed in the United States annually. In this Review, we examine the history of genetic variant discovery in complex disease with a focus on IBD. We cover methods that have been applied to microsatellite, common variant, targeted resequencing, and whole-exome and -genome data, specifically focusing on the progression of technologies towards rare-variant discovery. The inception of these methods combined with better availability of population level variation data has led to rapid discovery of IBD-causative and/or -associated variants at over 200 loci; over time, these methods have grown exponentially in both power and ascertainment to detect rare variation. We highlight rare-variant discoveries critical to the elucidation of the pathogenesis of IBD, including those in NOD2, IL23R, CARD9, RNF186, and ADCY7. We additionally identify the major areas of rare-variant discovery that will evolve in the coming years. A better understanding of the genetic basis of IBD and other complex diseases will lead to improved diagnosis, prognosis, treatment, and surveillance.
View details for DOI 10.1093/hmg/ddz189
View details for PubMedID 31363759
DE NOVO MUTATIONS IN AUTISM IMPLICATE THE SYNAPTIC ELIMINATION NETWORK.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2016; 22: 521-532
Autism has been shown to have a major genetic risk component; the architecture of documented autism in families has been over and again shown to be passed down for generations. While inherited risk plays an important role in the autistic nature of children, de novo (germline) mutations have also been implicated in autism risk. Here we find that autism de novo variants verified and published in the literature are Bonferroni-significantly enriched in a gene set implicated in synaptic elimination. Additionally, several of the genes in this synaptic elimination set that were enriched in protein-protein interactions (CACNA1C, SHANK2, SYNGAP1, NLGN3, NRXN1, and PTEN) have been previously confirmed as genes that confer risk for the disorder. The results demonstrate that autism-associated de novos are linked to proper synaptic pruning and density, hinting at the etiology of autism and suggesting pathophysiology for downstream correction and treatment.
View details for PubMedID 27897003