Manuel Rivas
Assistant Professor of Biomedical Data Science
Department of Biomedical Data Science
Honors & Awards
-
Clarendon Scholar, University of Oxford (2010-2015)
-
Osler Award, University of Oxford (2010-2015)
-
Gates Millenium Scholar, Bill & Melinda Gates Foundation (2004-2008)
Professional Education
-
DPhil, University of Oxford, Clinical Medicine (2015)
-
B.S., Massachusetts Institute of Technology, Mathematics (2008)
2024-25 Courses
- Generative AI in Healthcare
BIODS 295, DESIGN 266 (Spr) - Surfing the Waves of Data
BIODS 296 (Win) - Workshop in Biostatistics
BIODS 260C, STATS 260C (Spr) -
Independent Studies (7)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIODS 299 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum) - Master's Research
CME 291 (Aut, Win, Spr, Sum) - Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum) - Ph.D. Research
CME 400 (Aut, Win, Spr, Sum) - Ph.D. Research Rotation
CME 391 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
-
Prior Year Courses
2023-24 Courses
- Generative AI in Healthcare
BIODS 295, DESIGN 266 (Spr) - Workshop in Biostatistics
BIODS 260A, STATS 260A (Aut)
- Generative AI in Healthcare
All Publications
-
Survival Analysis on Rare Events Using Group-Regularized Multi-Response Cox Regression.
Bioinformatics (Oxford, England)
2021
Abstract
MOTIVATION: The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data.RESULTS: We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is applicable when there is one or more other survival responses that 1. has a large number of observed events; 2. share a common set of associated predictors with the rare event response. This scenario is common in the UK Biobank (Sudlow et al., 2015) dataset where records for a large number of common and less prevalent diseases of the same set of individuals are available. By analyzing these responses together, we hope to achieve higher prediction performance than when they are analyzed individually. To make this approach practical for large-scale data, we developed an accelerated proximal gradient optimization algorithm as well as a screening procedure inspired by Qian et al. (2020).AVAILABILITY: https://github.com/rivas-lab/multisnpnet-Cox.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btab095
View details for PubMedID 33560296
-
Polygenic risk modeling with latent trait-related genetic components.
European journal of human genetics : EJHG
2021
Abstract
Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk based on components from Decomposition of Genetic Associations (DeGAs), which we call the DeGAs polygenic risk score (dPRS). We compute DeGAs using genetic associations for 977 traits and find that dPRS performs comparably to standard PRS while offering greater interpretability. We show how to decompose an individual's genetic risk for a trait across DeGAs components, with examples for body mass index (BMI) and myocardial infarction (heart attack) in 337,151 white British individuals in the UK Biobank, with replication in a further set of 25,486 non-British white individuals. We find that BMI polygenic risk factorizes into components related to fat-free mass, fat mass, and overall health indicators like physical activity. Most individuals with high dPRS for BMI have strong contributions from both a fat-mass component and a fat-free mass component, whereas a few "outlier" individuals have strong contributions from only one of the two components. Overall, our method enables fine-scale interpretation of the drivers of genetic risk for complex traits.
View details for DOI 10.1038/s41431-021-00813-0
View details for PubMedID 33558700
-
Genetics of 35 blood and urine biomarkers in the UK Biobank.
Nature genetics
2021
Abstract
Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n=363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n=135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.
View details for DOI 10.1038/s41588-020-00757-z
View details for PubMedID 33462484
-
Graphical analysis for phenome-wide causal discovery in genotyped population-scale biobanks.
Nature communications
2021; 12 (1): 350
Abstract
Causal inference via Mendelian randomization requires making strong assumptions about horizontal pleiotropy, where genetic instruments are connected to the outcome not only through the exposure. Here, we present causal Graphical Analysis Using Genetics (cGAUGE), a pipeline that overcomes these limitations using instrument filters with provable properties. This is achievable by identifying conditional independencies while examining multiple traits. cGAUGE also uses ExSep (Exposure-based Separation), a novel test for the existence of causal pathways that does not require selecting instruments. In simulated data we illustrate how cGAUGE can reduce the empirical false discovery rate by up to 30%, while retaining the majority of true discoveries. On 96 complex traits from 337,198 subjects from the UK Biobank, our results cover expected causal links and many new ones that were previously suggested by correlation-based observational studies. Notably, we identify multiple risk factors for cardiovascular disease, including red blood cell distribution width.
View details for DOI 10.1038/s41467-020-20516-2
View details for PubMedID 33441555
-
Sex-specific genetic effects across biomarkers.
European journal of human genetics : EJHG
2020
Abstract
Sex differences have been shown in laboratory biomarkers; however, the extent to which this is due to genetics is unknown. In this study, we infer sex-specific genetic parameters (heritability and genetic correlation) across 33 quantitative biomarker traits in 181,064 females and 156,135 males from the UK Biobank study. We apply a Bayesian Mixture Model, Sex Effects Mixture Model(SEMM), to Genome-wide Association Study summary statistics in order to (1) estimate the contributions of sex to the genetic variance of these biomarkers and (2) identify variants whose statistical association with these traits is sex-specific. We find that the genetics of most biomarker traits are shared between males and females, with the notable exception of testosterone, where we identify 119 female and 445 male-specific variants. These include protein-altering variants in steroid hormone production genes (POR, UGT2B7). Using the sex-specific variants as genetic instruments for Mendelian randomization, we find evidence for causal links between testosterone levels and height, body mass index, waist and hip circumference, and type 2 diabetes. We also show that sex-specific polygenic risk score models for testosterone outperform a combined model. Overall, these results demonstrate that while sex has a limited role in the genetics of most biomarker traits, sex plays an important role in testosterone genetics.
View details for DOI 10.1038/s41431-020-00712-w
View details for PubMedID 32873964
-
Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma
PLOS GENETICS
2020; 16 (5)
View details for DOI 10.1371/journal.pgen.1008682.r004
View details for Web of Science ID 000538052400007
-
Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank.
Biostatistics (Oxford, England)
2020
Abstract
We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.
View details for DOI 10.1093/biostatistics/kxaa038
View details for PubMedID 32989444
-
Assessing Digital Phenotyping to Enhance Genetic Studies of Human Diseases.
American journal of human genetics
2020
Abstract
Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.
View details for DOI 10.1016/j.ajhg.2020.03.007
View details for PubMedID 32275883
-
A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.
PLoS genetics
2020; 16 (10): e1009141
Abstract
The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, has been proved to be an effective method for simultaneous variable selection and estimation. However, the large-scale and ultrahigh dimension seen in the UK Biobank pose new challenges for applying the lasso method, as many existing algorithms and their implementations are not scalable to large applications. In this paper, we propose a computational framework called batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and easily build a scalable solution for very large data, including those that are larger than the memory size. We introduce snpnet, an R package that implements the proposed algorithm on top of glmnet and optimizes for single nucleotide polymorphism (SNP) datasets. It currently supports ℓ1-penalized linear model, logistic regression, Cox model, and also extends to the elastic net with ℓ1/ℓ2 penalty. We demonstrate results on the UK Biobank dataset, where we achieve competitive predictive performance for all four phenotypes considered (height, body mass index, asthma, high cholesterol) using only a small fraction of the variants compared with other established polygenic risk score methods.
View details for DOI 10.1371/journal.pgen.1009141
View details for PubMedID 33095761
-
Homogeneity in the association of body mass index with type 2 diabetes across the UK Biobank: A Mendelian randomization study.
PLoS medicine
2019; 16 (12): e1002982
Abstract
BACKGROUND: Lifestyle interventions to reduce body mass index (BMI) are critical public health strategies for type 2 diabetes prevention. While weight loss interventions have shown demonstrable benefit for high-risk and prediabetic individuals, we aimed to determine whether the same benefits apply to those at lower risk.METHODS AND FINDINGS: We performed a multi-stratum Mendelian randomization study of the effect size of BMI on diabetes odds in 287,394 unrelated individuals of self-reported white British ancestry in the UK Biobank, who were recruited from across the United Kingdom from 2006 to 2010 when they were between the ages of 40 and 69 years. Individuals were stratified on the following diabetes risk factors: BMI, diabetes family history, and genome-wide diabetes polygenic risk score. The main outcome measure was the odds ratio of diabetes per 1-kg/m2 BMI reduction, in the full cohort and in each stratum. Diabetes prevalence increased sharply with BMI, family history of diabetes, and genetic risk. Conversely, predicted risk reduction from weight loss was strikingly similar across BMI and genetic risk categories. Weight loss was predicted to substantially reduce diabetes odds even among lower-risk individuals: for instance, a 1-kg/m2 BMI reduction was associated with a 1.37-fold reduction (95% CI 1.12-1.68) in diabetes odds among non-overweight individuals (BMI < 25 kg/m2) without a family history of diabetes, similar to that in obese individuals (BMI ≥ 30 kg/m2) with a family history (1.21-fold reduction, 95% CI 1.13-1.29). A key limitation of this analysis is that the BMI-altering DNA sequence polymorphisms it studies represent cumulative predisposition over an individual's entire lifetime, and may consequently incorrectly estimate the risk modification potential of weight loss interventions later in life.CONCLUSIONS: In a population-scale cohort, lower BMI was consistently associated with reduced diabetes risk across BMI, family history, and genetic risk categories, suggesting all individuals can substantially reduce their diabetes risk through weight loss. Our results support the broad deployment of weight loss interventions to individuals at all levels of diabetes risk.
View details for DOI 10.1371/journal.pmed.1002982
View details for PubMedID 31821322
-
Rare and common variant discovery in complex disease: the IBD case study.
Human molecular genetics
2019
Abstract
Complex diseases such as inflammatory bowel disease (IBD), which consists of ulcerative colitis and Crohn's disease, are a significant medical burden - 70,000 new cases of IBD are diagnosed in the United States annually. In this Review, we examine the history of genetic variant discovery in complex disease with a focus on IBD. We cover methods that have been applied to microsatellite, common variant, targeted resequencing, and whole-exome and -genome data, specifically focusing on the progression of technologies towards rare-variant discovery. The inception of these methods combined with better availability of population level variation data has led to rapid discovery of IBD-causative and/or -associated variants at over 200 loci; over time, these methods have grown exponentially in both power and ascertainment to detect rare variation. We highlight rare-variant discoveries critical to the elucidation of the pathogenesis of IBD, including those in NOD2, IL23R, CARD9, RNF186, and ADCY7. We additionally identify the major areas of rare-variant discovery that will evolve in the coming years. A better understanding of the genetic basis of IBD and other complex diseases will lead to improved diagnosis, prognosis, treatment, and surveillance.
View details for DOI 10.1093/hmg/ddz189
View details for PubMedID 31363759
-
Phenome-wide Burden of Copy-Number Variation in the UK Biobank.
American journal of human genetics
2019
Abstract
Copy-number variations (CNVs) represent a significant proportion of the genetic differences between individuals and many CNVs associate causally with syndromic disease and clinical outcomes. Here, we characterize the landscape of copy-number variation and their phenome-wide effects in a sample of 472,228 array-genotyped individuals from the UK Biobank. In addition to population-level selection effects against genic loci conferring high mortality, we describe genetic burden from potentially pathogenic and previously uncharacterized CNV loci across more than 3,000 quantitative and dichotomous traits, with separate analyses for common and rare classes of variation. Specifically, we highlight the effects of CNVs at two well-known syndromic loci 16p11.2 and 22q11.2, previously uncharacterized variation at 9p23, and several genic associations in the context of acute coronary artery disease and high body mass index. Our data constitute a deeply contextualized portrait of population-wide burden of copy-number variation, as well as a series of dosage-mediated genic associations across the medical phenome.
View details for DOI 10.1016/j.ajhg.2019.07.001
View details for PubMedID 31353025
-
Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics
BIOINFORMATICS
2019; 35 (14): 2495–97
View details for DOI 10.1093/bioinformatics/bty999
View details for Web of Science ID 000477703600091
-
Opportunities and challenges for transcriptome-wide association studies.
Nature genetics
2019; 51 (4): 592-599
Abstract
Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and gene expression datasets to identify gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes at GWAS loci, by using simulations and case studies of literature-curated candidate causal genes for schizophrenia, low-density-lipoprotein cholesterol and Crohn's disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene as well as loci where TWAS prioritizes multiple genes, some likely to be non-causal, owing to sharing of expression quantitative trait loci (eQTL). TWAS is especially prone to spurious prioritization with expression data from non-trait-related tissues or cell types, owing to substantial cross-cell-type variation in expression levels and eQTL strengths. Nonetheless, TWAS prioritizes candidate causal genes more accurately than simple baselines. We suggest best practices for causal-gene prioritization with TWAS and discuss future opportunities for improvement. Our results showcase the strengths and limitations of using eQTL datasets to determine causal genes at GWAS loci.
View details for DOI 10.1038/s41588-019-0385-z
View details for PubMedID 30926968
-
Opportunities and challenges for transcriptome-wide association studies
NATURE GENETICS
2019; 51 (4): 592–99
View details for DOI 10.1038/s41588-019-0385-z
View details for Web of Science ID 000462767500005
-
Significant shared heritability underlies suicide attempt and clinically predicted probability of attempting suicide.
Molecular psychiatry
2019
Abstract
Suicide accounts for nearly 800,000 deaths per year worldwide with rates of both deaths and attempts rising. Family studies have estimated substantial heritability of suicidal behavior; however, collecting the sample sizes necessary for successful genetic studies has remained a challenge. We utilized two different approaches in independent datasets to characterize the contribution of common genetic variation to suicide attempt. The first is a patient reported suicide attempt phenotype asked as part of an online mental health survey taken by a subset of participants (n=157,366) in the UK Biobank. After quality control, we leveraged a genotyped set of unrelated, white British ancestry participants including 2433 cases and 334,766 controls that included those that did not participate in the survey or were not explicitly asked about attempting suicide. The second leveraged electronic health record (EHR) data from the Vanderbilt University Medical Center (VUMC, 2.8 million patients, 3250 cases) and machine learning to derive probabilities of attempting suicide in 24,546 genotyped patients. We identified significant and comparable heritability estimates of suicide attempt from both the patient reported phenotype in the UK Biobank (h2SNP=0.035, p=7.12*10-4) and the clinically predicted phenotype from VUMC (h2SNP=0.046, p=1.51*10-2). A significant genetic overlap was demonstrated between the two measures of suicide attempt in these independent samples through polygenic risk score analysis (t=4.02, p=5.75*10-5) and genetic correlation (rg=1.073, SE=0.36, p=0.003). Finally, we show significant but incomplete genetic correlation of suicide attempt with insomnia (rg=0.34-0.81) as well as several psychiatric disorders (rg=0.26-0.79). This work demonstrates the contribution of common genetic variation to suicide attempt. It points to a genetic underpinning to clinically predicted risk of attempting suicide that is similar to the genetic profile from a patient reported outcome. Lastly, it presents an approach for using EHR data and clinical prediction to generate quantitative measures from binary phenotypes that can improve power for genetic studies.
View details for PubMedID 30610202
-
Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology.
Nature communications
2019; 10 (1): 4064
Abstract
Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identify key components of genetic associations and the contributions of variants, genes, and phenotypes to each component. As an illustration of the utility of the approach to inform downstream experiments, we report putative loss of function variants, rs114285050 (GPR151) and rs150090666 (PDE3B), that substantially contribute to obesity-related traits and experimentally demonstrate the role of these genes in adipocyte biology. Our approach to dissect components of genetic associations across the human phenome will accelerate biomedical hypothesis generation by providing insights on previously unexplored latent structures.
View details for DOI 10.1038/s41467-019-11953-9
View details for PubMedID 31492854
-
Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics.
Bioinformatics (Oxford, England)
2018
Abstract
Summary: Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests, and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities.Availability and implementation: GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.
View details for PubMedID 30520965
-
DeepTag: inferring diagnoses from veterinary clinical notes.
NPJ digital medicine
2018; 1: 60
Abstract
Large scale veterinary clinical records can become a powerful resource for patient care and research. However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free-text notes. The lack of standard coding makes it challenging to use the clinical data to improve patient care. It is also a major impediment to cross-species translational research, which relies on the ability to accurately identify patient cohorts with specific diagnostic criteria in humans and animals. In order to reduce the coding burden for veterinary clinical practice and aid translational research, we have developed a deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free-text notes. DeepTag is trained on a newly curated dataset of 112,558 veterinary notes manually annotated by experts. DeepTag extends multitask LSTM with an improved hierarchical objective that captures the semantic structures between diseases. To foster human-machine collaboration, DeepTag also learns to abstain in examples when it is uncertain and defers them to human experts, resulting in improved performance. DeepTag accurately infers disease codes from free-text even in challenging cross-hospital settings where the text comes from different clinical settings than the ones used for training. It enables automated disease annotation across a broad range of clinical diagnoses with minimal preprocessing. The technical framework in this work can be applied in other medical domains that currently lack medical coding resources.
View details for DOI 10.1038/s41746-018-0067-8
View details for PubMedID 31304339
View details for PubMedCentralID PMC6550285
-
Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study
NATURE COMMUNICATIONS
2018; 9: 1612
Abstract
Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. Here, we characterize the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank and find 27 associations between medical phenotypes and protein-truncating variants in genes outside the major histocompatibility complex. We perform phenome-wide analyses and directly measure the effect in homozygous carriers, commonly referred to as "human knockouts," across medical phenotypes for genes implicated as being protective against disease or associated with at least one phenotype in our study. We find several genes with strong pleiotropic or non-additive effects. Our results illustrate the importance of protein-truncating variants in a variety of diseases.
View details for PubMedID 29691392
-
Insights into the genetic epidemiology of Crohn's and rare diseases in the Ashkenazi Jewish population.
PLoS genetics
2018; 14 (5): e1007329
Abstract
As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with "pathogenic" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10-100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10-16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations.
View details for PubMedID 29795570
-
A protein-truncating R179X variant in RNF186 confers protection against ulcerative colitis
NATURE COMMUNICATIONS
2016; 7
Abstract
Protein-truncating variants protective against human disease provide in vivo validation of therapeutic targets. Here we used targeted sequencing to conduct a search for protein-truncating variants conferring protection against inflammatory bowel disease exploiting knowledge of common variants associated with the same disease. Through replication genotyping and imputation we found that a predicted protein-truncating variant (rs36095412, p.R179X, genotyped in 11,148 ulcerative colitis patients and 295,446 controls, MAF=up to 0.78%) in RNF186, a single-exon ring finger E3 ligase with strong colonic expression, protects against ulcerative colitis (overall P=6.89 × 10(-7), odds ratio=0.30). We further demonstrate that the truncated protein exhibits reduced expression and altered subcellular localization, suggesting the protective mechanism may reside in the loss of an interaction or function via mislocalization and/or loss of an essential transmembrane domain.
View details for DOI 10.1038/ncomms12342
View details for Web of Science ID 000380952600001
View details for PubMedID 27503255
View details for PubMedCentralID PMC4980482
-
Discovery of rare variants for complex phenotypes
HUMAN GENETICS
2016; 135 (6): 625-634
Abstract
With the rise of sequencing technologies, it is now feasible to assess the role rare variants play in the genetic contribution to complex trait variation. While some of the earlier targeted sequencing studies successfully identified rare variants of large effect, unbiased gene discovery using exome sequencing has experienced limited success for complex traits. Nevertheless, rare variant association studies have demonstrated that rare variants do contribute to phenotypic variability, but sample sizes will likely have to be even larger than those of common variant association studies to be powered for the detection of genes and loci. Large-scale sequencing efforts of tens of thousands of individuals, such as the UK10K Project and aggregation efforts such as the Exome Aggregation Consortium, have made great strides in advancing our knowledge of the landscape of rare variation, but there remain many considerations when studying rare variation in the context of complex traits. We discuss these considerations in this review, presenting a broad range of topics at a high level as an introduction to rare variant analysis in complex traits including the issues of power, study design, sample ascertainment, de novo variation, and statistical testing approaches. Ultimately, as sequencing costs continue to decline, larger sequencing studies will yield clearer insights into the biological consequence of rare mutations and may reveal which genes play a role in the etiology of complex traits.
View details for DOI 10.1007/s00439-016-1679-1
View details for Web of Science ID 000377017000005
View details for PubMedID 27221085
-
Assessing allele-specific expression across multiple tissues from RNA-seq read data
BIOINFORMATICS
2015; 31 (15): 2497-2504
Abstract
RNA sequencing enables allele-specific expression (ASE) studies that complement standard genotype expression studies for common variants and, importantly, also allow measuring the regulatory impact of rare variants. The Genotype-Tissue Expression (GTEx) project is collecting RNA-seq data on multiple tissues of a same set of individuals and novel methods are required for the analysis of these data.We present a statistical method to compare different patterns of ASE across tissues and to classify genetic variants according to their impact on the tissue-wide expression profile. We focus on strong ASE effects that we are expecting to see for protein-truncating variants, but our method can also be adjusted for other types of ASE effects. We illustrate the method with a real data example on a tissue-wide expression profile of a variant causal for lipoid proteinosis, and with a simulation study to assess our method more generally.
View details for DOI 10.1093/bioinformatics/btv074
View details for Web of Science ID 000359312400011
View details for PubMedID 25819081
View details for PubMedCentralID PMC4514921
-
Effect of predicted protein-truncating genetic variants on the human transcriptome
SCIENCE
2015; 348 (6235): 666-669
Abstract
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.
View details for DOI 10.1126/science.1261877
View details for Web of Science ID 000354045700038
View details for PubMedCentralID PMC4537935
-
The Power of Gene-Based Rare Variant Methods to Detect Disease-Associated Variation and Test Hypotheses About Complex Disease
PLOS GENETICS
2015; 11 (4)
Abstract
Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α = 2.5 × 10(-6)) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.
View details for DOI 10.1371/journal.pgen.1005165
View details for Web of Science ID 000354524200049
View details for PubMedID 25906071
View details for PubMedCentralID PMC4407972
-
Choice of transcripts and software has a large effect on variant annotation
GENOME MEDICINE
2014; 6
Abstract
Variant annotation is a crucial step in the analysis of genome sequencing data. Functional annotation results can have a strong influence on the ultimate conclusions of disease studies. Incorrect or incomplete annotations can cause researchers both to overlook potentially disease-relevant DNA variants and to dilute interesting variants in a pool of false positives. Researchers are aware of these issues in general, but the extent of the dependency of final results on the choice of transcripts and software used for annotation has not been quantified in detail.This paper quantifies the extent of differences in annotation of 80 million variants from a whole-genome sequencing study. We compare results using the RefSeq and Ensembl transcript sets as the basis for variant annotation with the software Annovar, and also compare the results from two annotation software packages, Annovar and VEP (Ensembl's Variant Effect Predictor), when using Ensembl transcripts.We found only 44% agreement in annotations for putative loss-of-function variants when using the RefSeq and Ensembl transcript sets as the basis for annotation with Annovar. The rate of matching annotations for loss-of-function and nonsynonymous variants combined was 79% and for all exonic variants it was 83%. When comparing results from Annovar and VEP using Ensembl transcripts, matching annotations were seen for only 65% of loss-of-function variants and 87% of all exonic variants, with splicing variants revealed as the category with the greatest discrepancy. Using these comparisons, we characterised the types of apparent errors made by Annovar and VEP and discuss their impact on the analysis of DNA variants in genome sequencing studies.Variant annotation is not yet a solved problem. Choice of transcript set can have a large effect on the ultimate variant annotations obtained in a whole-genome sequencing study. Choice of annotation software can also have a substantial effect. The annotation step in the analysis of a genome sequencing study must therefore be considered carefully, and a conscious choice made as to which transcript set and software are used for annotation.
View details for DOI 10.1186/gm543
View details for Web of Science ID 000339377700001
View details for PubMedID 24944579
View details for PubMedCentralID PMC4062061
-
Assessing association between protein truncating variants and quantitative traits
BIOINFORMATICS
2013; 29 (19): 2419-2426
Abstract
In sequencing studies of common diseases and quantitative traits, power to test rare and low frequency variants individually is weak. To improve power, a common approach is to combine statistical evidence from several genetic variants in a region. Major challenges are how to do the combining and which statistical framework to use. General approaches for testing association between rare variants and quantitative traits include aggregating genotypes and trait values, referred to as 'collapsing', or using a score-based variance component test. However, little attention has been paid to alternative models tailored for protein truncating variants. Recent studies have highlighted the important role that protein truncating variants, commonly referred to as 'loss of function' variants, may have on disease susceptibility and quantitative levels of biomarkers. We propose a Bayesian modelling framework for the analysis of protein truncating variants and quantitative traits.Our simulation results show that our models have an advantage over the commonly used methods. We apply our models to sequence and exome-array data and discover strong evidence of association between low plasma triglyceride levels and protein truncating variants at APOC3 (Apolipoprotein C3).Software is available from http://www.well.ox.ac.uk/~rivas/mamba
View details for DOI 10.1093/bioinformatics/btt409
View details for Web of Science ID 000324778500008
View details for PubMedID 23860716
View details for PubMedCentralID PMC3777107
-
Transcriptome and genome sequencing uncovers functional variation in humans.
Nature
2013; 501 (7468): 506-511
Abstract
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
View details for DOI 10.1038/nature12531
View details for PubMedID 24037378
-
Deep Resequencing of GWAS Loci Identifies Rare Variants in CARD9, IL23R and RNF186 That Are Associated with Ulcerative Colitis
PLOS GENETICS
2013; 9 (9)
Abstract
Genome-wide association studies and follow-up meta-analyses in Crohn's disease (CD) and ulcerative colitis (UC) have recently identified 163 disease-associated loci that meet genome-wide significance for these two inflammatory bowel diseases (IBD). These discoveries have already had a tremendous impact on our understanding of the genetic architecture of these diseases and have directed functional studies that have revealed some of the biological functions that are important to IBD (e.g. autophagy). Nonetheless, these loci can only explain a small proportion of disease variance (~14% in CD and 7.5% in UC), suggesting that not only are additional loci to be found but that the known loci may contain high effect rare risk variants that have gone undetected by GWAS. To test this, we have used a targeted sequencing approach in 200 UC cases and 150 healthy controls (HC), all of French Canadian descent, to study 55 genes in regions associated with UC. We performed follow-up genotyping of 42 rare non-synonymous variants in independent case-control cohorts (totaling 14,435 UC cases and 20,204 HC). Our results confirmed significant association to rare non-synonymous coding variants in both IL23R and CARD9, previously identified from sequencing of CD loci, as well as identified a novel association in RNF186. With the exception of CARD9 (OR = 0.39), the rare non-synonymous variants identified were of moderate effect (OR = 1.49 for RNF186 and OR = 0.79 for IL23R). RNF186 encodes a protein with a RING domain having predicted E3 ubiquitin-protein ligase activity and two transmembrane domains. Importantly, the disease-coding variant is located in the ubiquitin ligase domain. Finally, our results suggest that rare variants in genes identified by genome-wide association in UC are unlikely to contribute significantly to the overall variance for the disease. Rather, these are expected to help focus functional studies of the corresponding disease loci.
View details for DOI 10.1371/journal.pgen.1003723
View details for Web of Science ID 000325076600010
View details for PubMedID 24068945
View details for PubMedCentralID PMC3772057
-
A Flexible Approach for the Analysis of Rare Variants Allowing for a Mixture of Effects on Binary or Quantitative Traits
PLOS GENETICS
2013; 9 (8)
Abstract
Multiple rare variants either within or across genes have been hypothesised to collectively influence complex human traits. The increasing availability of high throughput sequencing technologies offers the opportunity to study the effect of rare variants on these traits. However, appropriate and computationally efficient analytical methods are required to account for collections of rare variants that display a combination of protective, deleterious and null effects on the trait. We have developed a novel method for the analysis of rare genetic variation in a gene, region or pathway that, by simply aggregating summary statistics at each variant, can: (i) test for the presence of a mixture of effects on a trait; (ii) be applied to both binary and quantitative traits in population-based and family-based data; (iii) adjust for covariates to allow for non-genetic risk factors and; (iv) incorporate imputed genetic variation. In addition, for preliminary identification of promising genes, the method can be applied to association summary statistics, available from meta-analysis of published data, for example, without the need for individual level genotype data. Through simulation, we show that our method is immune to the presence of bi-directional effects, with no apparent loss in power across a range of different mixtures, and can achieve greater power than existing approaches as long as summary statistics at each variant are robust. We apply our method to investigate association of type-1 diabetes with imputed rare variants within genes in the major histocompatibility complex using genotype data from the Wellcome Trust Case Control Consortium.
View details for DOI 10.1371/journal.pgen.1003694
View details for Web of Science ID 000323830300045
View details for PubMedID 23966874
View details for PubMedCentralID PMC3744430
-
Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease
NATURE GENETICS
2011; 43 (11): 1066-U50
Abstract
More than 1,000 susceptibility loci have been identified through genome-wide association studies (GWAS) of common variants; however, the specific genes and full allelic spectrum of causal variants underlying these findings have not yet been defined. Here we used pooled next-generation sequencing to study 56 genes from regions associated with Crohn's disease in 350 cases and 350 controls. Through follow-up genotyping of 70 rare and low-frequency protein-altering variants in nine independent case-control series (16,054 Crohn's disease cases, 12,153 ulcerative colitis cases and 17,575 healthy controls), we identified four additional independent risk factors in NOD2, two additional protective variants in IL23R, a highly significant association with a protective splice variant in CARD9 (P < 1 × 10(-16), odds ratio ≈ 0.29) and additional associations with coding variants in IL18RAP, CUL2, C1orf106, PTPN22 and MUC19. We extend the results of successful GWAS by identifying new, rare and probably functional variants that could aid functional experiments and predictive models.
View details for DOI 10.1038/ng.952
View details for Web of Science ID 000296584000009
View details for PubMedID 21983784
-
A framework for variation discovery and genotyping using next-generation DNA sequencing data
NATURE GENETICS
2011; 43 (5): 491-?
Abstract
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
View details for DOI 10.1038/ng.806
View details for Web of Science ID 000289972600023
View details for PubMedID 21478889
View details for PubMedCentralID PMC3083463
-
Testing for an Unusual Distribution of Rare Variants
PLOS GENETICS
2011; 7 (3)
Abstract
Technological advances make it possible to use high-throughput sequencing as a primary discovery tool of medical genetics, specifically for assaying rare variation. Still this approach faces the analytic challenge that the influence of very rare variants can only be evaluated effectively as a group. A further complication is that any given rare variant could have no effect, could increase risk, or could be protective. We propose here the C-alpha test statistic as a novel approach for testing for the presence of this mixture of effects across a set of rare variants. Unlike existing burden tests, C-alpha, by testing the variance rather than the mean, maintains consistent power when the target set contains both risk and protective variants. Through simulations and analysis of case/control data, we demonstrate good power relative to existing methods that assess the burden of rare variants in individuals.
View details for DOI 10.1371/journal.pgen.1001322
View details for Web of Science ID 000288996600004
View details for PubMedID 21408211
-
Narcolepsy risk loci outline role of T cell autoimmunity and infectious triggers in narcolepsy.
Nature communications
2023; 14 (1): 2709
Abstract
Narcolepsy type 1 (NT1) is caused by a loss of hypocretin/orexin transmission. Risk factors include pandemic 2009 H1N1 influenza A infection and immunization with Pandemrix®. Here, we dissect disease mechanisms and interactions with environmental triggers in a multi-ethnic sample of 6,073 cases and 84,856 controls. We fine-mapped GWAS signals within HLA (DQ0602, DQB1*03:01 and DPB1*04:02) and discovered seven novel associations (CD207, NAB1, IKZF4-ERBB3, CTSC, DENND1B, SIRPG, PRF1). Significant signals at TRA and DQB1*06:02 loci were found in 245 vaccination-related cases, who also shared polygenic risk. T cell receptor associations in NT1 modulated TRAJ*24, TRAJ*28 and TRBV*4-2 chain-usage. Partitioned heritability and immune cell enrichment analyses found genetic signals to be driven by dendritic and helper T cells. Lastly comorbidity analysis using data from FinnGen, suggests shared effects between NT1 and other autoimmune diseases. NT1 genetic variants shape autoimmunity and response to environmental triggers, including influenza A infection and immunization with Pandemrix®.
View details for DOI 10.1038/s41467-023-36120-z
View details for PubMedID 37188663
View details for PubMedCentralID PMC10185546
-
SGLT2 inhibitor ameliorates endothelial dysfunction associated with the common ALDH2 alcohol flushing variant.
Science translational medicine
2023; 15 (680): eabp9952
Abstract
The common aldehyde dehydrogenase 2 (ALDH2) alcohol flushing variant known as ALDH2*2 affects ∼8% of the world's population. Even in heterozygous carriers, this missense variant leads to a severe loss of ALDH2 enzymatic activity and has been linked to an increased risk of coronary artery disease (CAD). Endothelial cell (EC) dysfunction plays a determining role in all stages of CAD pathogenesis, including early-onset CAD. However, the contribution of ALDH2*2 to EC dysfunction and its relation to CAD are not fully understood. In a large genome-wide association study (GWAS) from Biobank Japan, ALDH2*2 was found to be one of the strongest single-nucleotide polymorphisms associated with CAD. Clinical assessment of endothelial function showed that human participants carrying ALDH2*2 exhibited impaired vasodilation after light alcohol drinking. Using human induced pluripotent stem cell-derived ECs (iPSC-ECs) and CRISPR-Cas9-corrected ALDH2*2 iPSC-ECs, we modeled ALDH2*2-induced EC dysfunction in vitro, demonstrating an increase in oxidative stress and inflammatory markers and a decrease in nitric oxide (NO) production and tube formation capacity, which was further exacerbated by ethanol exposure. We subsequently found that sodium-glucose cotransporter 2 inhibitors (SGLT2i) such as empagliflozin mitigated ALDH2*2-associated EC dysfunction. Studies in ALDH2*2 knock-in mice further demonstrated that empagliflozin attenuated ALDH2*2-mediated vascular dysfunction in vivo. Mechanistically, empagliflozin inhibited Na+/H+-exchanger 1 (NHE-1) and activated AKT kinase and endothelial NO synthase (eNOS) pathways to ameliorate ALDH2*2-induced EC dysfunction. Together, our results suggest that ALDH2*2 induces EC dysfunction and that SGLT2i may potentially be used as a preventative measure against CAD for ALDH2*2 carriers.
View details for DOI 10.1126/scitranslmed.abp9952
View details for PubMedID 36696485
-
LARGE-SCALE MULTIVARIATE SPARSE REGRESSION WITH APPLICATIONS TO UK BIOBANK
ANNALS OF APPLIED STATISTICS
2022; 16 (3): 1891-1918
View details for DOI 10.1214/21-AOAS1575
View details for Web of Science ID 000828472200030
-
Deconvoluting complex correlates of COVID-19 severity with a multi-omic pandemic tracking strategy.
Nature communications
2022; 13 (1): 5107
Abstract
The SARS-CoV-2 pandemic has differentially impacted populations across race and ethnicity. A multi-omic approach represents a powerful tool to examine risk across multi-ancestry genomes. We leverage a pandemic tracking strategy in which we sequence viral and host genomes and transcriptomes from nasopharyngeal swabs of 1049 individuals (736 SARS-CoV-2 positive and 313 SARS-CoV-2 negative) and integrate them with digital phenotypes from electronic health records from a diverse catchment area in Northern California. Genome-wide association disaggregated by admixture mapping reveals novel COVID-19-severity-associated regions containing previously reported markers of neurologic, pulmonary and viral disease susceptibility. Phylodynamic tracking of consensus viral genomes reveals no association with disease severity or inferred ancestry. Summary data from multiomic investigation reveals metagenomic and HLA associations with severe COVID-19. The wealth of data available from residual nasopharyngeal swabs in combination with clinical data abstracted automatically at scale highlights a powerful strategy for pandemic tracking, and reveals distinct epidemiologic, genetic, and biological associations for those at the highest risk.
View details for DOI 10.1038/s41467-022-32397-8
View details for PubMedID 36042219
-
Large-scale sequencing identifies multiple genes and rare variants associated with Crohn's disease susceptibility.
Nature genetics
2022
Abstract
Genome-wide association studies (GWASs) have identified hundreds of loci associated with Crohn's disease (CD). However, as with all complex diseases, robust identification of the genes dysregulated by noncoding variants typically driving GWAS discoveries has been challenging. Here, to complement GWASs and better define actionable biological targets, we analyzed sequence data from more than 30,000 patients with CD and 80,000 population controls. We directly implicate ten genes in general onset CD for the first time to our knowledge via association to coding variation, four of which lie within established CD GWAS loci. In nine instances, a single coding variant is significantly associated, and in the tenth, ATG4C, we see additionally a significantly increased burden of very rare coding variants in CD cases. In addition to reiterating the central role of innate and adaptive immune cells as well as autophagy in CD pathogenesis, these newly associated genes highlight the emerging role of mesenchymal cells in the development and maintenance of intestinal inflammation.
View details for DOI 10.1038/s41588-022-01156-2
View details for PubMedID 36038634
-
High heritability of ascending aortic diameter and trans-ancestry prediction of thoracic aortic disease.
Nature genetics
2022
Abstract
Enlargement of the aorta is an important risk factor for aortic aneurysm and dissection, a leading cause of morbidity in the developed world. Here we performed automated extraction of ascending aortic diameter from cardiac magnetic resonance images of 36,021 individuals from the UK Biobank, followed by genome-wide association. We identified lead variants across 41 loci, including genes related to cardiovascular development (HAND2, TBX20) and Mendelian forms of thoracic aortic disease (ELN, FBN1). A polygenic score significantly predicted prevalent risk of thoracic aortic aneurysm and the need for surgical intervention for patients with thoracic aneurysm across multiple ancestries within the UK Biobank, FinnGen, the Penn Medicine Biobank and the Million Veterans Program (MVP). Additionally, we highlight the primary causal role of blood pressure in reducing aortic dilation using Mendelian randomization. Overall, our findings provide a roadmap for using genetic determinants of human anatomy to understand cardiovascular development while improving prediction of diseases of the thoracic aorta.
View details for DOI 10.1038/s41588-022-01070-7
View details for PubMedID 35637384
-
Opportunities and challenges for the use of common controls in sequencing studies.
Nature reviews. Genetics
2022
Abstract
Genome-wide association studies using large-scale genome and exome sequencing data have become increasingly valuable in identifying associations between genetic variants and disease, transforming basic research and translational medicine. However, this progress has not been equally shared across all people and conditions, in part due to limited resources. Leveraging publicly available sequencing data as external common controls, rather than sequencing new controls for every study, can better allocate resources by augmenting control sample sizes or providing controls where none existed. However, common control studies must be carefully planned and executed as even small differences in sample ascertainment and processing can result in substantial bias. Here, we discuss challenges and opportunities for the robust use of common controls in high-throughput sequencing studies, including study design, quality control and statistical approaches. Thoughtful generation and use of large and valuable genetic sequencing data sets will enable investigation of a broader and more representative set of conditions, environments and genetic ancestries than otherwise possible.
View details for DOI 10.1038/s41576-022-00487-4
View details for PubMedID 35581355
-
Integration of rare expression outlier-associated variants improves polygenic risk prediction.
American journal of human genetics
2022
Abstract
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p= 3*10-14), 62.3% increase in risk for severe obesity (p= 1*10-6), and median 5.29 years earlier onset for bariatric surgery (p=0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p= 2*10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.
View details for DOI 10.1016/j.ajhg.2022.04.015
View details for PubMedID 35588732
-
Cannabinoid receptor 1 antagonist genistein attenuates marijuana-induced vascular inflammation.
Cell
2022
Abstract
Epidemiological studies reveal that marijuana increases the risk of cardiovascular disease (CVD); however, little is known about the mechanism. Δ9-tetrahydrocannabinol (Δ9-THC), the psychoactive component of marijuana, binds to cannabinoid receptor 1 (CB1/CNR1) in the vasculature and is implicated in CVD. A UK Biobank analysis found that cannabis was an risk factor for CVD. We found that marijuana smoking activated inflammatory cytokines implicated in CVD. In silico virtual screening identified genistein, a soybean isoflavone, as a putative CB1 antagonist. Human-induced pluripotent stem cell-derived endothelial cells were used to model Δ9-THC-induced inflammation and oxidative stress via NF-κB signaling. Knockdown of the CB1 receptor with siRNA, CRISPR interference, and genistein attenuated the effects of Δ9-THC. In mice, genistein blocked Δ9-THC-induced endothelial dysfunction in wire myograph, reduced atherosclerotic plaque, and had minimal penetration of the central nervous system. Genistein is a CB1 antagonist that attenuates Δ9-THC-induced atherosclerosis.
View details for DOI 10.1016/j.cell.2022.04.005
View details for PubMedID 35489334
-
Significant sparse polygenic risk scores across 813 traits in UK Biobank.
PLoS genetics
2022; 18 (3): e1010105
Abstract
We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 x 10-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman's ⍴ = 0.61, p = 2.2 x 10-59 for quantitative traits, ⍴ = 0.21, p = 9.6 x 10-4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).
View details for DOI 10.1371/journal.pgen.1010105
View details for PubMedID 35324888
-
Bayesian model comparison for rare-variant association studies.
American journal of human genetics
2021
Abstract
Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.
View details for DOI 10.1016/j.ajhg.2021.11.005
View details for PubMedID 34822764
-
A cross-population atlas of genetic associations for 220 human phenotypes.
Nature genetics
2021
Abstract
Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n=179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal=628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.
View details for DOI 10.1038/s41588-021-00931-x
View details for PubMedID 34594039
-
APOC3 genetic variation, serum triglycerides, and risk of coronary artery disease in Asian Indians, Europeans, and other ethnic groups.
Lipids in health and disease
2021; 20 (1): 113
Abstract
BACKGROUND: Hypertriglyceridemia has emerged as a critical coronary artery disease (CAD) risk factor. Rare loss-of-function (LoF) variants in apolipoprotein C-III have been reported to reduce triglycerides (TG) and are cardioprotective in American Indians and Europeans. However, there is a lack of data in other Europeans and non-Europeans. Also, whether genetically increased plasma TG due to ApoC-III is causally associated with increased CAD risk is still unclear and inconsistent. The objectives of this study were to verify the cardioprotective role of earlier reported sixLoF variants of APOC3 in South Asians and other multi-ethnic cohorts and to evaluate the causal association of TG raising common variants for increasing CAD risk.METHODS: We performed gene-centric and Mendelian randomization analyses and evaluated the role of genetic variation encompassing APOC3 for affecting circulating TG and the risk for developing CAD.RESULTS: One rare LoF variant (rs138326449) with a 37% reduction in TG was associated with lowered risk for CAD in Europeans (p=0.007), but we could not confirm this association in Asian Indians (p=0.641).Our data could not validate the cardioprotective role of other five LoFvariants analysed. A common variant rs5128 in the APOC3 was strongly associated with elevated TG levels showing a p-value 2.8*10-424. Measures of plasma ApoC-III in a small subset of Sikhs revealed a 37% increase in ApoC-III concentrations among homozygous mutant carriers than the wild-type carriers of rs5128. A genetically instrumented per 1SD increment of plasma TG level of 15mg/dL would cause a mild increase (3%) in the risk for CAD (p=0.042).CONCLUSIONS: Our results highlight the challenges of inclusion of rare variant information in clinical risk assessment and the generalizability of implementation of ApoC-III inhibition for treating atherosclerotic disease. More studies would be needed to confirm whether genetically raised TG and ApoC-III concentrations would increase CAD risk.
View details for DOI 10.1186/s12944-021-01531-8
View details for PubMedID 34548093
-
Mapping the human genetic architecture of COVID-19.
Nature
2021
Abstract
The genetic makeup of an individual contributes to susceptibility and response to viral infection. While environmental, clinical and social factors play a role in exposure to SARS-CoV-2 and COVID-19 disease severity1,2, host genetics may also be important. Identifying host-specific genetic factors may reveal biological mechanisms of therapeutic relevance and clarify causal relationships of modifiable environmental risk factors for SARS-CoV-2 infection and outcomes. We formed a global network of researchers to investigate the role of human genetics in SARS-CoV-2 infection and COVID-19 severity. We describe the results of three genome-wide association meta-analyses comprised of up to 49,562 COVID-19 patients from 46 studies across 19 countries. We reported 13 genome-wide significant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19. Several of these loci correspond to previously documented associations to lung or autoimmune and inflammatory diseases3-7. They also represent potentially actionable mechanisms in response to infection. Mendelian Randomization analyses support a causal role for smoking and body mass index for severe COVID-19 although not for type II diabetes. The identification of novel host genetic factors associated with COVID-19, with unprecedented speed, was made possible by the community of human genetic researchers coming together to prioritize sharing of data, results, resources and analytical frameworks. This working model of international collaboration underscores what is possible for future genetic discoveries in emerging pandemics, or indeed for any complex human disease.
View details for DOI 10.1038/s41586-021-03767-x
View details for PubMedID 34237774
-
Fast Numerical Optimization for Genome Sequencing Data in Population Biobanks.
Bioinformatics (Oxford, England)
2021
Abstract
MOTIVATION: Large-scale and high-dimensional genome sequencing data poses computational challenges. General purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data.RESULTS: We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0, 1, 2, NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact 2-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1,000,000 variants and almost 100,000 individuals within 10minutes and using less than 32GB of memory.AVAILABILITY: https://github.com/rivas-lab/snpnet/tree/compact.
View details for DOI 10.1093/bioinformatics/btab452
View details for PubMedID 34146108
-
Exome Sequencing in Patient-Parent Trios Suggests New Candidate Genes for Early-onset Primary Sclerosing Cholangitis.
Liver international : official journal of the International Association for the Study of the Liver
2021
Abstract
BACKGROUND & AIMS: Primary sclerosing cholangitis (PSC) is a rare bile duct disease strongly associated with inflammatory bowel disease (IBD). Whole-exome sequencing (WES) has contributed to understanding the molecular basis of very early-onset IBD, but rare protein-altering genetic variants have not been identified for early-onset PSC. We performed WES in patients diagnosed with PSC ≤ 12 years to investigate the contribution of rare genetic variants to early-onset PSC.METHODS: In this multicenter study, WES was performed on 87 DNA samples from 29 patient-parent trios with early-onset PSC. We selected rare (minor allele frequency < 2%) coding and splice-site variants that matched recessive (homozygous and compound heterozygous variants) and dominant (de novo) inheritance in the index patients. Variant pathogenicity was predicted by an in-house developed algorithm (GAVIN), and PSC-relevant variants were selected using gene expression data and gene function.RESULTS: In 22 out of 29 trios we identified at least 1 possibly pathogenic variant. We prioritized 36 genes, harboring a total of 54 variants with predicted pathogenic effects. In 18 genes we identified 36 compound heterozygous variants, whereas in the other 18 genes we identified 18 de novo variants. Twelve of 36 candidate risk genes are known to play a role in transmembrane transport, adaptive and innate immunity, and epithelial barrier function.CONCLUSIONS: The 36 candidate genes for early-onset PSC need further verification in other patient cohorts and evaluation of gene function before a causal role can be attributed to its variants.
View details for DOI 10.1111/liv.14831
View details for PubMedID 33590606
-
GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background.
eLife
2021; 10
Abstract
Genome-wide association studies (GWAS) have been used to study the genetic basis of a wide variety of complex diseases and other traits. We describe UK Biobank GWAS results for three molecular traits-urate, IGF-1, and testosterone-with better-understood biology than most other complex traits. We find that many of the most significant hits are readily interpretable. We observe huge enrichment of associations near genes involved in the relevant biosynthesis, transport, or signaling pathways. We show how GWAS data illuminate the biology of each trait, including differences in testosterone regulation between females and males. At the same time, even these molecular traits are highly polygenic, with many thousands of variants spread across the genome contributing to trait variance. In summary, for these three molecular traits we identify strong enrichment of signal in putative core gene sets, even while most of the SNP-based heritability is driven by a massively polygenic background.
View details for DOI 10.7554/eLife.58615
View details for PubMedID 33587031
-
A regulatory variant at 3q21.1 confers an increased pleiotropic risk for hyperglycemia and altered bone mineral density.
Cell metabolism
2021
Abstract
Skeletal and glycemic traits have shared etiology, but the underlying genetic factors remain largely unknown. To identify genetic loci that may have pleiotropic effects, we studied Genome-wide association studies (GWASs) for bone mineral density and glycemic traits and identified a bivariate risk locus at 3q21. Using sequence and epigenetic modeling, we prioritized an adenylate cyclase 5 (ADCY5) intronic causal variant, rs56371916. This SNP changes the binding affinity of SREBP1 and leads to differential ADCY5 gene expression, altering the chromatin landscape from poised to repressed. These alterations result in bone- and type 2 diabetes-relevant cell-autonomous changes in lipid metabolism in osteoblasts and adipocytes. We validated our findings by directly manipulating the regulator SREBP1, the target gene ADCY5, and the variant rs56371916, which together imply a novel link between fatty acid oxidation and osteoblast differentiation. Our work, by systematic functional dissection of pleiotropic GWAS loci, represents a framework to uncover biological mechanisms affecting pleiotropic traits.
View details for DOI 10.1016/j.cmet.2021.01.001
View details for PubMedID 33513366
-
Nonsense-mediated decay is highly stable across individuals and tissues.
American journal of human genetics
2021
Abstract
Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.
View details for DOI 10.1016/j.ajhg.2021.06.008
View details for PubMedID 34216550
-
Association of accelerometer-derived sleep measures with lifetime psychiatric diagnoses: A cross-sectional study of 89,205 participants from the UK Biobank.
PLoS medicine
2021; 18 (10): e1003782
Abstract
Sleep problems are both symptoms of and modifiable risk factors for many psychiatric disorders. Wrist-worn accelerometers enable objective measurement of sleep at scale. Here, we aimed to examine the association of accelerometer-derived sleep measures with psychiatric diagnoses and polygenic risk scores in a large community-based cohort.In this post hoc cross-sectional analysis of the UK Biobank cohort, 10 interpretable sleep measures-bedtime, wake-up time, sleep duration, wake after sleep onset, sleep efficiency, number of awakenings, duration of longest sleep bout, number of naps, and variability in bedtime and sleep duration-were derived from 7-day accelerometry recordings across 89,205 participants (aged 43 to 79, 56% female, 97% self-reported white) taken between 2013 and 2015. These measures were examined for association with lifetime inpatient diagnoses of major depressive disorder, anxiety disorders, bipolar disorder/mania, and schizophrenia spectrum disorders from any time before the date of accelerometry, as well as polygenic risk scores for major depression, bipolar disorder, and schizophrenia. Covariates consisted of age and season at the time of the accelerometry recording, sex, Townsend deprivation index (an indicator of socioeconomic status), and the top 10 genotype principal components. We found that sleep pattern differences were ubiquitous across diagnoses: each diagnosis was associated with a median of 8.5 of the 10 accelerometer-derived sleep measures, with measures of sleep quality (for instance, sleep efficiency) generally more affected than mere sleep duration. Effect sizes were generally small: for instance, the largest magnitude effect size across the 4 diagnoses was β = -0.11 (95% confidence interval -0.13 to -0.10, p = 3 × 10-56, FDR = 6 × 10-55) for the association between lifetime inpatient major depressive disorder diagnosis and sleep efficiency. Associations largely replicated across ancestries and sexes, and accelerometry-derived measures were concordant with self-reported sleep properties. Limitations include the use of accelerometer-based sleep measurement and the time lag between psychiatric diagnoses and accelerometry.In this study, we observed that sleep pattern differences are a transdiagnostic feature of individuals with lifetime mental illness, suggesting that they should be considered regardless of diagnosis. Accelerometry provides a scalable way to objectively measure sleep properties in psychiatric clinical research and practice, even across tens of thousands of individuals.
View details for DOI 10.1371/journal.pmed.1003782
View details for PubMedID 34637446
-
Efficient Computation and Analysis of Distributional Shapley Values
MICROTOME PUBLISHING. 2021
View details for Web of Science ID 000659893801002
-
Sleep apnoea is a risk factor for severe COVID-19.
BMJ open respiratory research
2021; 8 (1)
Abstract
BACKGROUND: Obstructive sleep apnoea (OSA) is associated with higher body mass index (BMI), diabetes, older age and male gender, which are all risk factors for severe COVID-19.We aimed to study if OSA is an independent risk factor for COVID-19 infection or for severe COVID-19.METHODS: OSA diagnosis and COVID-19 infection were extracted from the hospital discharge, causes of death and infectious diseases registries in individuals who participated in the FinnGen study (n=260 405). Severe COVID-19 was defined as COVID-19 requiring hospitalisation. Multivariate logistic regression model was used to examine association. Comorbidities for either COVID-19 or OSA were selected as covariates. We performed a meta-analysis with previous studies.RESULTS: We identified 445 individuals with COVID-19, and 38 (8.5%) of them with OSA of whom 19 out of 91 (20.9%) were hospitalised. OSA associated with COVID-19 hospitalisation independent from age, sex, BMI and comorbidities (p-unadjusted=5.13*10-5, OR-adjusted=2.93 (95% CI 1.02 to 8.39), p-adjusted=0.045). OSA was not associated with the risk of contracting COVID-19 (p=0.25). A meta-analysis of OSA and severe COVID-19 showed association across 15 835 COVID-19 positive controls, and n=1294 patients with OSA with severe COVID-19 (OR=2.37 (95% 1.14 to 4.95), p=0.021).CONCLUSION: Risk for contracting COVID-19 was the same for patients with OSA and those without OSA. In contrast, among COVID-19 positive patients, OSA was associated with higher risk for hospitalisation. Our findings are in line with earlier works and suggest OSA as an independent risk factor for severe COVID-19.
View details for DOI 10.1136/bmjresp-2020-000845
View details for PubMedID 33436406
-
Combining Clinical and Polygenic Risk Improves Stroke Prediction Among Individuals with Atrial Fibrillation.
Circulation. Genomic and precision medicine
2021
Abstract
Background - Atrial fibrillation (AF) is associated with a five-fold increased risk of ischemic stroke. A portion of this risk is heritable, however current risk stratification tools (CHA2DS2-VASc) don't include family history or genetic risk. We hypothesized that we could improve ischemic stroke prediction in patients with AF by incorporating polygenic risk scores (PRS). Methods - Using data from the largest available GWAS in Europeans, we combined over half a million genetic variants to construct a PRS to predict ischemic stroke in patients with AF. We externally validated this PRS in independent data from the UK Biobank, both independently and integrated with clinical risk factors. The integrated PRS and clinical risk factors risk tool had the greatest predictive ability. Results - Compared with the currently recommended risk tool (CHA2DS2-VASc), the integrated tool significantly improved net reclassification (NRI: 2.3% (95%CI: 1.3% to 3.0%)), and fit (χ2 P =0.002). Using this improved tool, >115,000 people with AF would have improved risk classification in the US. Independently, PRS was a significant predictor of ischemic stroke in patients with AF prospectively (Hazard Ratio: 1.13 per 1 SD (95%CI: 1.06 to 1.23)). Lastly, polygenic risk scores were uncorrelated with clinical risk factors (Pearson's correlation coefficient: -0.018). Conclusions - In patients with AF, there appears to be a significant association between PRS and risk of ischemic stroke. The greatest predictive ability was found with the integration of PRS and clinical risk factors, however the prediction of stroke remains challenging.
View details for DOI 10.1161/CIRCGEN.120.003168
View details for PubMedID 34029116
-
Time trajectories in the transcriptomic response to exercise - a meta-analysis.
Nature communications
2021; 12 (1): 3471
Abstract
Exercise training prevents multiple diseases, yet the molecular mechanisms that drive exercise adaptation are incompletely understood. To address this, we create a computational framework comprising data from skeletal muscle or blood from 43 studies, including 739 individuals before and after exercise or training. Using linear mixed effects meta-regression, we detect specific time patterns and regulatory modulators of the exercise response. Acute and long-term responses are transcriptionally distinct and we identify SMAD3 as a central regulator of the exercise response. Exercise induces a more pronounced inflammatory response in skeletal muscle of older individuals and our models reveal multiple sex-associated responses. We validate seven of our top genes in a separate human cohort. In this work, we provide a powerful resource ( www.extrameta.org ) that expands the transcriptional landscape of exercise adaptation by extending previously known responses and their regulatory networks, and identifying novel modality-, time-, age-, and sex-associated changes.
View details for DOI 10.1038/s41467-021-23579-x
View details for PubMedID 34108459
-
Significant shared heritability underlies suicide attempt and clinically predicted probability of attempting suicide
MOLECULAR PSYCHIATRY
2020; 25 (10): 2422–30
View details for DOI 10.1038/s41380-018-0326-8
View details for Web of Science ID 000572540100019
-
Race, socioeconomic deprivation, and hospitalization for COVID-19 in English participants of a national biobank.
International journal for equity in health
2020; 19 (1): 114
Abstract
Preliminary reports suggest that the Coronavirus Disease 2019 (COVID-19) pandemic has led to disproportionate morbidity and mortality among historically disadvantaged populations. We investigate the racial and socioeconomic associations of COVID-19 hospitalization among 418,794 participants of the UK Biobank, of whom 549 (0.13%) had been hospitalized. Both Black participants (odds ratio 3.7; 95%CI 2.5-5.3) and Asian participants (odds ratio 2.2; 95%CI 1.5-3.2) were at substantially increased risk as compared to White participants. We further observed a striking gradient in COVID-19 hospitalization rates according to the Townsend Deprivation Index - a composite measure of socioeconomic deprivation - and household income. Adjusting for socioeconomic factors and cardiorespiratory comorbidities led to only modest attenuation of the increased risk in Black participants, adjusted odds ratio 2.4 (95%CI 1.5-3.7). These observations confirm and extend earlier preliminary and lay press reports of higher morbidity in non-White individuals in the context of a large population of participants in a national biobank. The extent to which this increased risk relates to variation in pre-existing comorbidities, differences in testing or hospitalization patterns, or additional disparities in social determinants of health warrants further study.
View details for DOI 10.1186/s12939-020-01227-y
View details for PubMedID 32631328
-
Molecular Transducers of Physical Activity Consortium (MoTrPAC): Mapping the Dynamic Responses to Exercise.
Cell
2020; 181 (7): 1464–74
Abstract
Exercise provides a robust physiological stimulus that evokes cross-talk among multiple tissues that when repeated regularly (i.e., training) improves physiological capacity, benefits numerous organ systems, and decreases the risk for premature mortality. However, a gap remains in identifying the detailed molecular signals induced by exercise that benefits health and prevents disease. The Molecular Transducers of Physical Activity Consortium (MoTrPAC) was established to address this gap and generate a molecular map of exercise. Preclinical and clinical studies will examine the systemic effects of endurance and resistance exercise across a range of ages and fitness levels by molecular probing of multiple tissues before and after acute and chronic exercise. From this multi-omic and bioinformatic analysis, a molecular map of exercise will be established. Altogether, MoTrPAC will provide a public database that is expected to enhance our understanding of the health benefits of exercise and to provide insight into how physical activity mitigates disease.
View details for DOI 10.1016/j.cell.2020.06.004
View details for PubMedID 32589957
-
Race, Socioeconomic Deprivation, and Hospitalization for COVID-19 in English participants of a National Biobank.
medRxiv : the preprint server for health sciences
2020
Abstract
Preliminary reports suggest that the Coronavirus Disease 2019 (COVID-19) pandemic has led to disproportionate morbidity and mortality among historically disadvantaged populations. The extent to which these disparities are related to socioeconomic versus biologic factors is largely unknown. We investigate the racial and socioeconomic associations of COVID-19 hospitalization among 418,794 participants of the UK Biobank, of whom 549 (0.13%) had been hospitalized. Both black participants (odds ratio 3.4; 95%CI 2.4-4.9) and Asian participants (odds ratio 2.1; 95%CI 1.5-3.2) were at substantially increased risk as compared to white participants. We further observed a striking gradient in COVID-19 hospitalization rates according to the Townsend Deprivation Index - a composite measure of socioeconomic deprivation - and household income. Adjusting for such factors led to only modest attenuation of the increased risk in black participants, adjusted odds ratio 3.1 (95%CI 2.0-4.8). These observations confirm and extend earlier preliminary and lay press reports of higher morbidity in non-white individuals in the context of a large population of participants in a national biobank. The extent to which this increased risk relates to variation in pre-existing comorbidities, differences in testing or hospitalization patterns, or additional disparities in social determinants of health warrants further study.
View details for DOI 10.1101/2020.04.27.20082107
View details for PubMedID 32511642
-
Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma.
PLoS genetics
2020; 16 (5): e1008682
Abstract
Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through rare protein-altering variant association analysis, we find a missense variant in ANGPTL7 in UK Biobank (rs28991009, p.Gln175His, MAF = 0.8%, genotyped in 82,253 individuals with measured IOP and an independent set of 4,238 glaucoma patients and 250,660 controls) that significantly lowers IOP (beta = -0.53 and -0.67 mmHg for heterozygotes, -3.40 and -2.37 mmHg for homozygotes, P = 5.96 x 10-9 and 1.07 x 10-13 for corneal compensated and Goldman-correlated IOP, respectively) and is associated with 34% reduced risk of glaucoma (P = 0.0062). In FinnGen, we identify an ANGPTL7 missense variant at a greater than 50-fold increased frequency in Finland compared with other populations (rs147660927, p.Arg220Cys, MAF Finland = 4.3%), which was genotyped in 6,537 glaucoma patients and 170,362 controls and is associated with a 29% lower glaucoma risk (P = 1.9 x 10-12 for all glaucoma types and also protection against its subtypes including exfoliation, primary open-angle, and primary angle-closure). We further find three rarer variants in UK Biobank, including a protein-truncating variant, which confer a strong composite lowering of IOP (P = 0.0012 and 0.24 for Goldman-correlated and corneal compensated IOP, respectively), suggesting the protective mechanism likely resides in the loss of interaction or function. Our results support inhibition or down-regulation of ANGPTL7 as a therapeutic strategy for glaucoma.
View details for DOI 10.1371/journal.pgen.1008682
View details for PubMedID 32369491
-
Cardiac Imaging of Aortic Valve Area from 34,287 UK Biobank Participants Reveal Novel Genetic Associations and Shared Genetic Comorbidity with Multiple Disease Phenotypes.
Circulation. Genomic and precision medicine
2020
Abstract
Background - The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods - From a sample of 34,287 white British-ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac MRI sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by polygenic risk scoring and phenome-wide screening to identify genetic comorbidities. Results - A genome-wide association study of aortic valve area in these UK Biobank participants showed three significant associations, indexed by rs71190365 (chr13:50764607, DLEU1, p=1.8×10-9), rs35991305 (chr12:94191968, CRADD, p=3.4×10-8) and chr17:45013271:C:T (GOSR2, p=5.6×10-8). Replication on an independent set of 8,145 unrelated European-ancestry participants showed consistent effect sizes in all three loci, although rs35991305 did not meet nominal significance. We constructed a polygenic risk score for aortic valve area, which in a separate cohort of 311,728 individuals without imaging demonstrated that smaller aortic valve area is predictive of increased risk for aortic valve disease (Odds Ratio 1.14, p=2.3×10-6). After excluding subjects with a medical diagnosis of aortic valve stenosis (remaining n=308,683 individuals), phenome-wide association of >10,000 traits showed multiple links between the polygenic score for aortic valve disease and key health-related comorbidities involving the cardiovascular system and autoimmune disease. Genetic correlation analysis supports a shared genetic etiology with between aortic valve area and birthweight along with other cardiovascular conditions. Conclusions - These results illustrate the use of automated phenotyping of cardiac imaging data from the general population to investigate the genetic etiology of aortic valve disease, perform clinical prediction, and uncover new clinical and genetic correlates of cardiac anatomy.
View details for DOI 10.1161/CIRCGEN.120.003014
View details for PubMedID 33125279
-
FasTag: Automatic text classification of unstructured medical narratives.
PloS one
2020; 15 (6): e0234647
Abstract
Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.
View details for DOI 10.1371/journal.pone.0234647
View details for PubMedID 32569327
-
The role of polygenic risk and susceptibility genes in breast cancer over the course of life.
Nature communications
2020; 11 (1): 6383
Abstract
Polygenic risk scores (PRS) for breast cancer have potential to improve risk prediction, but there is limited information on their utility in various clinical situations. Here we show that among 122,978 women in the FinnGen study with 8401 breast cancer cases, the PRS modifies the breast cancer risk of two high-impact frameshift risk variants. Similarly, we show that after the breast cancer diagnosis, individuals with elevated PRS have an elevated risk of developing contralateral breast cancer, and that the PRS can considerably improve risk assessment among their female first-degree relatives. In more detail, women with the c.1592delT variant in PALB2 (242-fold enrichment in Finland, 336 carriers) and an average PRS (10-90th percentile) have a lifetime risk of breast cancer at 55% (95% CI 49-61%), which increases to 84% (71-97%) with a high PRS (>90th percentile), and decreases to 49% (30-68%) with a low PRS (<10th percentile). Similarly, for c.1100delC in CHEK2 (3.7-fold enrichment; 1648 carriers), the respective lifetime risks are 29% (27-32%), 59% (52-66%), and 9% (5-14%). The PRS also refines the risk assessment of women with first-degree relatives diagnosed with breast cancer, particularly among women with positive family history of early-onset breast cancer. Here we demonstrate the opportunities for a comprehensive way of assessing genetic risk in the general population, in breast cancer patients, and in unaffected family members.
View details for DOI 10.1038/s41467-020-19966-5
View details for PubMedID 33318493
-
Whole exome sequencing analyses reveal gene-microbiota interactions in the context of IBD.
Gut
2020
Abstract
Both the gut microbiome and host genetics are known to play significant roles in the pathogenesis of IBD. However, the interaction between these two factors and its implications in the aetiology of IBD remain underexplored. Here, we report on the influence of host genetics on the gut microbiome in IBD.To evaluate the impact of host genetics on the gut microbiota of patients with IBD, we combined whole exome sequencing of the host genome and whole genome shotgun sequencing of 1464 faecal samples from 525 patients with IBD and 939 population-based controls. We followed a four-step analysis: (1) exome-wide microbial quantitative trait loci (mbQTL) analyses, (2) a targeted approach focusing on IBD-associated genomic regions and protein truncating variants (PTVs, minor allele frequency (MAF) >5%), (3) gene-based burden tests on PTVs with MAF <5% and exome copy number variations (CNVs) with site frequency <1%, (4) joint analysis of both cohorts to identify the interactions between disease and host genetics.We identified 12 mbQTLs, including variants in the IBD-associated genes IL17REL, MYRF, SEC16A and WDR78. For example, the decrease of the pathway acetyl-coenzyme A biosynthesis, which is involved in short chain fatty acids production, was associated with variants in the gene MYRF (false discovery rate <0.05). Changes in functional pathways involved in the metabolic potential were also observed in participants carrying rare PTVs or CNVs in CYP2D6, GPR151 and CD160 genes. These genes are known for their function in the immune system. Moreover, interaction analyses confirmed previously known IBD disease-specific mbQTLs in TNFSF15.This study highlights that both common and rare genetic variants affecting the immune system are key factors in shaping the gut microbiota in the context of IBD and pinpoints towards potential mechanisms for disease treatment.
View details for DOI 10.1136/gutjnl-2019-319706
View details for PubMedID 32651235
-
A phenome-wide association study of 26 mendelian genes reveals phenotypic expressivity of common and rare variants within the general population.
PLoS genetics
2020; 16 (11): e1008802
Abstract
The clinical evaluation of a genetic syndrome relies upon recognition of a characteristic pattern of signs or symptoms to guide targeted genetic testing for confirmation of the diagnosis. However, individuals displaying a single phenotype of a complex syndrome may not meet criteria for clinical diagnosis or genetic testing. Here, we present a phenome-wide association study (PheWAS) approach to systematically explore the phenotypic expressivity of common and rare alleles in genes associated with four well-described syndromic diseases (Alagille (AS), Marfan (MS), DiGeorge (DS), and Noonan (NS) syndromes) in the general population. Using human phenotype ontology (HPO) terms, we systematically mapped 60 phenotypes related to AS, MS, DS and NS in 337,198 unrelated white British from the UK Biobank (UKBB) based on their hospital admission records, self-administrated questionnaires, and physiological measurements. We performed logistic regression adjusting for age, sex, and the first 5 genetic principal components, for each phenotype and each variant in the target genes (JAG1, NOTCH2 FBN1, PTPN1 and RAS-opathy genes, and genes in the 22q11.2 locus) and performed a gene burden test. Overall, we observed multiple phenotype-genotype correlations, such as the association between variation in JAG1, FBN1, PTPN11 and SOS2 with diastolic and systolic blood pressure; and pleiotropy among multiple variants in syndromic genes. For example, rs11066309 in PTPN11 was significantly associated with a lower body mass index, an increased risk of hypothyroidism and a smaller size for gestational age, all in concordance with NS-related phenotypes. Similarly, rs589668 in FBN1 was associated with an increase in body height and blood pressure, and a reduced body fat percentage as observed in Marfan syndrome. Our findings suggest that the spectrum of associations of common and rare variants in genes involved in syndromic diseases can be extended to individual phenotypes within the general population.
View details for DOI 10.1371/journal.pgen.1008802
View details for PubMedID 33226994
-
Genetic architecture of human plasma lipidome and its link to cardiovascular disease.
Nature communications
2019; 10 (1): 4329
Abstract
Understanding genetic architecture of plasma lipidome could provide better insights into lipid metabolism and its link to cardiovascular diseases (CVDs). Here, we perform genome-wide association analyses of 141 lipid species (n=2,181 individuals), followed by phenome-wide scans with 25 CVD related phenotypes (n=511,700 individuals). We identify 35 lipid-species-associated loci (P<5*10-8), 10 of which associate with CVD risk including five new loci-COL5A1, GLTPD2, SPTLC3, MBOAT7 and GALNT16 (false discovery rate<0.05). We identify loci for lipid species that are shown to predict CVD e.g., SPTLC3 for CER(d18:1/24:1). We show that lipoprotein lipase (LPL) may more efficiently hydrolyze medium length triacylglycerides (TAGs) than others. Polyunsaturated lipids have highest heritability and genetic correlations, suggesting considerable genetic regulation at fatty acids levels. We find low genetic correlations between traditional lipids and lipid species. Our results show that lipidomic profiles capture information beyond traditional lipids and identify genetic variants modifying lipid levels and risk of CVD.
View details for DOI 10.1038/s41467-019-11954-8
View details for PubMedID 31551469
-
Protein-coding variants implicate novel genes related to lipid homeostasis contributing to body-fat distribution
NATURE GENETICS
2019; 51 (3): 452-+
View details for DOI 10.1038/s41588-018-0334-2
View details for Web of Science ID 000459947200014
-
Association of Genetic Variants in NUDT15 With Thiopurine-Induced Myelosuppression in Patients With Inflammatory Bowel Disease.
JAMA
2019; 321 (8): 773–85
Abstract
Importance: Use of thiopurines may be limited by myelosuppression. TPMT pharmacogenetic testing identifies only 25% of at-risk patients of European ancestry. Among patients of East Asian ancestry, NUDT15 variants are associated with thiopurine-induced myelosuppression (TIM).Objective: To identify genetic variants associated with TIM among patients of European ancestry with inflammatory bowel disease (IBD).Design, Setting, and Participants: Case-control study of 491 patients affected by TIM and 679 thiopurine-tolerant unaffected patients who were recruited from 89 international sites between March 2012 and November 2015. Genome-wide association studies (GWAS) and exome-wide association studies (EWAS) were conducted in patients of European ancestry. The replication cohort comprised 73 patients affected by TIM and 840 thiopurine-tolerant unaffected patients.Exposures: Genetic variants associated with TIM.Main Outcomes and Measures: Thiopurine-induced myelosuppression, defined as a decline in absolute white blood cell count to 2.5*109/L or less or a decline in absolute neutrophil cell count to 1.0*109/L or less leading to a dose reduction or drug withdrawal.Results: Among 1077 patients (398 affected and 679 unaffected; median age at IBD diagnosis, 31.0 years [interquartile range, 21.2 to 44.1 years]; 540 [50%] women; 602 [56%] diagnosed as having Crohn disease), 919 (311 affected and 608 unaffected) were included in the GWAS analysis and 961 (328 affected and 633 unaffected) in the EWAS analysis. The GWAS analysis confirmed association of TPMT (chromosome 6, rs11969064) with TIM (30.5% [95/311] affected vs 16.4% [100/608] unaffected patients; odds ratio [OR], 2.3 [95% CI, 1.7 to 3.1], P=5.2*10-9). The EWAS analysis demonstrated an association with an in-frame deletion in NUDT15 (chromosome 13, rs746071566) and TIM (5.8% [19/328] affected vs 0.2% [1/633] unaffected patients; OR, 38.2 [95% CI, 5.1 to 286.1], P=1.3*10-8), which was replicated in a different cohort (2.7% [2/73] affected vs 0.2% [2/840] unaffected patients; OR, 11.8 [95% CI, 1.6 to 85.0], P=.03). Carriage of any of 3 coding NUDT15 variants was associated with an increased risk (OR, 27.3 [95% CI, 9.3 to 116.7], P=1.1*10-7) of TIM, independent of TPMT genotype and thiopurine dose.Conclusions and Relevance: Among patients of European ancestry with IBD, variants in NUDT15 were associated with increased risk of TIM. These findings suggest that NUDT15 genotyping may be considered prior to initiation of thiopurine therapy; however, further study including additional validation in independent cohorts is required.
View details for PubMedID 30806694
-
Association of Genetic Variants in NUDT15 With Thiopurine-Induced Myelosuppression in Patients With Inflammatory Bowel Disease
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION
2019; 321 (8): 773–85
View details for DOI 10.1001/jama.2019.0709
View details for Web of Science ID 000460191400019
-
Protein-coding variants implicate novel genes related to lipid homeostasis contributing to body-fat distribution.
Nature genetics
2019
Abstract
Body-fat distribution is a risk factor for adverse cardiovascular health consequences. We analyzed the association of body-fat distribution, assessed by waist-to-hip ratio adjusted for body mass index, with 228,985 predicted coding and splice site variants available on exome arrays in up to 344,369 individuals from five major ancestries (discovery) and 132,177 European-ancestry individuals (validation). We identified 15 common (minor allele frequency, MAF ≥5%) and nine low-frequency or rare (MAF <5%) coding novel variants. Pathway/gene set enrichment analyses identified lipid particle, adiponectin, abnormal white adipose tissue physiology and bone development and morphology as important contributors to fat distribution, while cross-trait associations highlight cardiometabolic traits. In functional follow-up analyses, specifically in Drosophila RNAi-knockdowns, we observed a significant increase in the total body triglyceride levels for two genes (DNAH10 and PLXND1). We implicate novel genes in fat distribution, stressing the importance of interrogating low-frequency and protein-coding variants.
View details for PubMedID 30778226
-
DeepTag: inferring diagnoses from veterinary clinical notes
NPJ DIGITAL MEDICINE
2018; 1
View details for DOI 10.1038/s41746-018-0067-8
View details for Web of Science ID 000449685400001
-
Large-Scale Phenome-Wide Association Study of PCSK9 Variants Demonstrates Protection Against Ischemic Stroke
CIRCULATION-GENOMIC AND PRECISION MEDICINE
2018; 11 (7): e002162
Abstract
PCSK9 inhibition is a potent new therapy for hypercholesterolemia and cardiovascular disease. Although short-term clinical trial results have not demonstrated major adverse effects, long-term data will not be available for some time. Genetic studies in large biobanks offer a unique opportunity to predict drug effects and provide context for the evaluation of future clinical trial outcomes.We tested the association of the PCSK9 missense variant rs11591147 with predefined phenotypes and phenome-wide, in 337 536 individuals of British ancestry in the UK Biobank, with independent discovery and replication. Using a Bayesian statistical method, we leveraged phenotype correlations to evaluate the phenome-wide impact of PCSK9 inhibition with higher power at a finer resolution.The T allele of rs11591147 showed a protective effect on hyperlipidemia (odds ratio, 0.63±0.04; P=2.32×10-38), coronary heart disease (odds ratio, 0.73±0.09; P=1.05×10-6), and ischemic stroke (odds ratio, 0.61±0.18; P=2.40×10-3) and was associated with increased type 2 diabetes mellitus risk adjusted for lipid-lowering medication status (odds ratio, 1.24±0.10; P=1.98×10-7). We did not observe associations with cataracts, heart failure, atrial fibrillation, and cognitive dysfunction. Leveraging phenotype correlations, we observed evidence of a protective association with cerebral infarction and vascular occlusion. These results explore the effects of direct PCSK9 inhibition; off-target effects cannot be predicted using this approach.This result represents the first genetic evidence in a large cohort for the protective effect of PCSK9 inhibition on ischemic stroke and corroborates exploratory evidence from clinical trials. PCSK9 inhibition was not associated with variables other than those related to LDL (low-density lipoprotein) cholesterol, atherosclerosis, and type 2 diabetes mellitus, suggesting that other effects are either small or absent.
View details for PubMedID 29997226
-
Base-specific mutational intolerance near splice sites clarifies the role of nonessential splice nucleotides
GENOME RESEARCH
2018; 28 (7): 968–74
Abstract
Variation in RNA splicing (i.e., alternative splicing) plays an important role in many diseases. Variants near 5' and 3' splice sites often affect splicing, but the effects of these variants on splicing and disease have not been fully characterized beyond the two "essential" splice nucleotides flanking each exon. Here we provide quantitative measurements of tolerance to mutational disruptions by position and reference allele-alternative allele combinations. We show that certain reference alleles are particularly sensitive to mutations, regardless of the alternative alleles into which they are mutated. Using public RNA-seq data, we demonstrate that individuals carrying such variants have significantly lower levels of the correctly spliced transcript, compared to individuals without them, and confirm that these specific substitutions are highly enriched for known Mendelian mutations. Our results propose a more refined definition of the "splice region" and offer a new way to prioritize and provide functional interpretation of variants identified in diagnostic sequencing and association studies.
View details for PubMedID 29858273
-
Quantifying the Impact of Rare and Ultra-rare Coding Variation across the Phenotypic Spectrum.
American journal of human genetics
2018
Abstract
There is a limited understanding about the impact of rare protein-truncating variants across multiple phenotypes. We explore the impact of this class of variants on 13 quantitative traits and 10 diseases using whole-exome sequencing data from 100,296 individuals. Protein-truncating variants in genes intolerant to this class of mutations increased risk of autism, schizophrenia, bipolar disorder, intellectual disability, and ADHD. In individuals without these disorders, there was an association with shorter height, lower education, increased hospitalization, and reduced age at enrollment. Gene sets implicated from GWASs did not show a significant protein-truncating variants burden beyond what was captured by established Mendelian genes. In conclusion, we provide a thorough investigation of the impact of rare deleterious coding variants on complex traits, suggesting widespread pleiotropic risk.
View details for PubMedID 29861106
-
Genetic variants in cellular transport do not affect mesalamine response in ulcerative colitis
PLOS ONE
2018; 13 (3): e0192806
Abstract
Mesalamine is commonly used to treat ulcerative colitis (UC). Although mesalamine acts topically, in vitro data suggest that intracellular transport is required for its beneficial effect. Genetic variants in mucosal transport proteins may affect this uptake, but the clinical relevance of these variants has not been studied. The aim of this study was to determine whether variants in genes involved in cellular transport affect the response to mesalamine in UC.Subjects with UC from a 6-week clinical trial using multiple doses of mesalamine were genotyped using a genome-wide array that included common exome variants. Analysis focused on cellular transport gene variants with a minor allele frequency >5%. Mesalamine response was defined as improvement in Week 6 Physician's Global Assessment (PGA) and non-response as a lack of improvement in Week 6 PGA. Quality control thresholds included an individual genotyping rate of >90%, SNP genotyping rate of >98%, and exclusion for subjects with cryptic relatedness. All included variants met Hardy-Weinberg equilibrium (p>0.001).457 adults with UC were included with 280 responders and 177 non-responders. There were no common variants in transporter genes that were associated with response to mesalamine. The genetic risk score of responders was similar to that of non-responders (p = 0.18). Genome-wide variants demonstrating a trend towards mesalamine response included ST8SIA5 (p = 1x10-5).Common transporter gene variants did not affect response to mesalamine in adult UC. The response to mesalamine may be due to rare genetic events or environmental factors such as the intestinal microbiome.
View details for PubMedID 29579042
-
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls (vol 4, 170179, 2017)
SCIENTIFIC DATA
2018; 5: 180002
Abstract
This corrects the article DOI: 10.1038/sdata.2017.179.
View details for DOI 10.1038/sdata.2018.2
View details for Web of Science ID 000423058900001
View details for PubMedID 29360107
View details for PubMedCentralID PMC5779067
-
Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2018; 115 (2): 379–84
Abstract
A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.
View details for PubMedID 29279374
-
Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity
NATURE GENETICS
2018; 50 (1): 26-+
View details for DOI 10.1038/s41588-017-0011-x
View details for Web of Science ID 000423157400007
-
Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity.
Nature genetics
2018; 50 (1): 26-41
Abstract
Genome-wide association studies (GWAS) have identified >250 loci for body mass index (BMI), implicating pathways related to neuronal biology. Most GWAS loci represent clusters of common, noncoding variants from which pinpointing causal genes remains challenging. Here we combined data from 718,734 individuals to discover rare and low-frequency (minor allele frequency (MAF) < 5%) coding variants associated with BMI. We identified 14 coding variants in 13 genes, of which 8 variants were in genes (ZBTB7B, ACHE, RAPGEF3, RAB21, ZFHX3, ENTPD6, ZFR2 and ZNF169) newly implicated in human obesity, 2 variants were in genes (MC4R and KSR2) previously observed to be mutated in extreme obesity and 2 variants were in GIPR. The effect sizes of rare variants are ~10 times larger than those of common variants, with the largest effect observed in carriers of an MC4R mutation introducing a stop codon (p.Tyr35Ter, MAF = 0.01%), who weighed ~7 kg more than non-carriers. Pathway analyses based on the variants associated with BMI confirm enrichment of neuronal genes and provide new evidence for adipocyte and energy expenditure biology, widening the potential of genetically supported therapeutic targets in obesity.
View details for DOI 10.1038/s41588-017-0011-x
View details for PubMedID 29273807
View details for PubMedCentralID PMC5945951
-
Data Descriptor: Sequence data and association statistics from 12,940 type 2 diabetes cases and controls
SCIENTIFIC DATA
2017; 4: 170179
Abstract
To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
View details for PubMedID 29257133
-
Large-scale GWAS identifies multiple loci for hand grip strength providing biological insights into muscular fitness
NATURE COMMUNICATIONS
2017; 8: 16015
Abstract
Hand grip strength is a widely used proxy of muscular fitness, a marker of frailty, and predictor of a range of morbidities and all-cause mortality. To investigate the genetic determinants of variation in grip strength, we perform a large-scale genetic discovery analysis in a combined sample of 195,180 individuals and identify 16 loci associated with grip strength (P<5 × 10-8) in combined analyses. A number of these loci contain genes implicated in structure and function of skeletal muscle fibres (ACTG1), neuronal maintenance and signal transduction (PEX14, TGFA, SYT1), or monogenic syndromes with involvement of psychomotor impairment (PEX14, LRPPRC and KANSL1). Mendelian randomization analyses are consistent with a causal effect of higher genetically predicted grip strength on lower fracture risk. In conclusion, our findings provide new biological insight into the mechanistic underpinnings of grip strength and the causal role of muscular strength in age-related morbidities and mortality.
View details for PubMedID 29313844
-
Mosaic mutations in blood DNA sequence are associated with solid tumor cancers
NPJ GENOMIC MEDICINE
2017; 2: 22
Abstract
Recent understanding of the causal role of blood-detectable somatic protein-truncating DNA variants in leukemia prompts questions about the generalizability of such observations across cancer types. We used the cancer genome atlas exome sequencing (~8000 samples) to compare 22 different cancer phenotypes with more than 6000 controls using a case-control study design and demonstrate that mosaic protein truncating variants in these genes are also associated with solid-tumor cancers. The absence of these cancer-associated mosaic variants from the tumors themselves suggest these are not themselves tumor drivers. Through analysis of different cancer phenotypes we observe gene-specificity for mosaic mutations. We confirm a specific link between PPM1D and ovarian cancer, consistent with previous reports linking PPM1D to breast and ovarian cancer. Additionally, glioblastoma, melanoma and lung cancers show gene specific burdens of mosaic protein truncating mutations. Taken together, these results extend existing observations and broadly link solid-tumor cancers to somatic blood DNA changes.
View details for PubMedID 29263833
-
biMM: Efficient estimation of genetic variances and covariances for cohorts with high-dimensional phenotype measurements.
Bioinformatics
2017
Abstract
Genetic research utilizes a decomposition of trait variances and covariances into genetic and environmental parts. Our software package biMM is a computationally efficient implementation of a bivariate linear mixed model for settings where hundreds of traits have been measured on partially overlapping sets of individuals.Implementation in R freely available at www.iki.fi/mpirinen .matti.pirinen@helsinki.fi.Available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btx166
View details for PubMedID 28369165
-
Variant Enriched in the Finnish Population is Associated With Fasting Insulin Levels and Type 2 Diabetes Risk.
Diabetes
2017
Abstract
To identify novel coding association signals and facilitate characterization of mechanisms influencing glycemic traits and type 2 diabetes risk, we analyzed 109,215 variants derived from exome array genotyping together with an additional 390,225 variants from exome sequence in up to 39,339 normoglycemic individuals from five ancestry groups. We identified a novel association between the coding variant (p.Pro50Thr) in AKT2 and fasting plasma insulin (FI), a gene in which rare fully penetrant mutations are causal for monogenic glycemic disorders. The low-frequency allele is associated with a 12% increase in FI levels. This variant is present at 1.1% frequency in Finns but virtually absent in individuals from other ancestries. Carriers of the FI-increasing allele had increased 2-h insulin values, decreased insulin sensitivity, and increased risk of type 2 diabetes (odds ratio 1.05). In cellular studies, the AKT2-Thr50 protein exhibited a partial loss of function. We extend the allelic spectrum for coding variants in AKT2 associated with disorders of glucose homeostasis and demonstrate bidirectional effects of variants within the pleckstrin homology domain of AKT2.
View details for DOI 10.2337/db16-1329
View details for PubMedID 28341696
-
Rare and low-frequency coding variants alter human adult height.
Nature
2017; 542 (7640): 186-190
Abstract
Height is a highly heritable, classic polygenic trait with approximately 700 common associated variants identified through genome-wide association studies so far. Here, we report 83 height-associated coding variants with lower minor-allele frequencies (in the range of 0.1-4.8%) and effects of up to 2 centimetres per allele (such as those in IHH, STC2, AR and CRISPLD2), greater than ten times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (giving an increase of 1-2 centimetres per allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes that are mutated in monogenic growth disorders and highlight new biological candidates (such as ADAMTS3, IL11RA and NOX4) and pathways (such as proteoglycan and glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate-to-large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.
View details for DOI 10.1038/nature21039
View details for PubMedID 28146470
-
Landscape of X chromosome inactivation across human tissues.
Nature
2017; 550 (7675): 244–48
Abstract
X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of 'escape' from inactivation varying between genes and individuals. The extent to which XCI is shared between cells and tissues remains poorly characterized, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression and phenotypic traits. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI.
View details for PubMedID 29022598
-
Frameshift indels introduced by genome editing can lead to in-frame exon skipping.
PloS one
2017; 12 (6)
Abstract
The introduction of frameshift indels by genome editing has emerged as a powerful technique to study the functions of uncharacterized genes in cell lines and model organisms. Such mutations should lead to mRNA degradation owing to nonsense-mediated mRNA decay or the production of severely truncated proteins. Here, we show that frameshift indels engineered by genome editing can also lead to skipping of "multiple of three nucleotides" exons. Such splicing events result in in-frame mRNA that may encode fully or partially functional proteins. We also characterize a segregating nonsense variant (rs2273865) located in a "multiple of three nucleotides" exon of LGALS8 that increases exon skipping in human erythroblast samples. Our results highlight the potentially frequent contribution of exonic splicing regulatory elements and are important for the interpretation of negative results in genome editing experiments. Moreover, they may contribute to a better annotation of loss-of-function mutations in the human genome.
View details for DOI 10.1371/journal.pone.0178700
View details for PubMedID 28570605
-
TMEM258 Is a Component of the Oligosaccharyltransferase Complex Controlling ER Stress and Intestinal Inflammation.
Cell reports
2016; 17 (11): 2955-2965
Abstract
Significant insights into disease pathogenesis have been gleaned from population-level genetic studies; however, many loci associated with complex genetic disease contain numerous genes, and phenotypic associations cannot be assigned unequivocally. In particular, a gene-dense locus on chromosome 11 (61.5-61.65 Mb) has been associated with inflammatory bowel disease, rheumatoid arthritis, and coronary artery disease. Here, we identify TMEM258 within this locus as a central regulator of intestinal inflammation. Strikingly, Tmem258 haploinsufficient mice exhibit severe intestinal inflammation in a model of colitis. At the mechanistic level, we demonstrate that TMEM258 is a required component of the oligosaccharyltransferase complex and is essential for N-linked protein glycosylation. Consequently, homozygous deficiency of Tmem258 in colonic organoids results in unresolved endoplasmic reticulum (ER) stress culminating in apoptosis. Collectively, our results demonstrate that TMEM258 is a central mediator of ER quality control and intestinal homeostasis.
View details for DOI 10.1016/j.celrep.2016.11.042
View details for PubMedID 27974209
View details for PubMedCentralID PMC5661940
-
Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing.
Scientific reports
2016; 6: 32406
Abstract
Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing-alternative splice sites, introns, and cleavage sites-which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.
View details for DOI 10.1038/srep32406
View details for PubMedID 27617755
View details for PubMedCentralID PMC5019111
-
Analysis of protein-coding genetic variation in 60,706 humans
NATURE
2016; 536 (7616): 285-?
Abstract
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
View details for DOI 10.1038/nature19057
View details for Web of Science ID 000381804900026
View details for PubMedID 27535533
-
The genetic architecture of type 2 diabetes
NATURE
2016; 536 (7614): 41-?
Abstract
The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
View details for DOI 10.1038/nature18642
View details for PubMedID 27398621
-
A Frameshift in CSF2RB Predominant Among Ashkenazi Jews Increases Risk for Crohn's Disease and Reduces Monocyte Signaling via GM-CSF.
Gastroenterology
2016
Abstract
Crohn's disease (CD) has the highest prevalence in Ashkenazi Jewish populations. We sought to identify rare, CD-associated frameshift variants of high functional and statistical effects.We performed exome sequencing and array-based genotype analyses of 1477 Ashkenazi Jewish individuals with CD and 2614 Ashkenazi Jewish individuals without CD (controls). To validate our findings, we performed genotype analyses of an additional 1515 CD cases and 7052 controls for frameshift mutations in the colony-stimulating factor 2-receptor β common subunit gene (CSF2RB). Intestinal tissues and blood samples were collected from patients with CD; lamina propria leukocytes were isolated and expression of CSF2RB and granulocyte-macrophage colony-stimulating factor-responsive cells were defined by adenomatous polyposis coli (APC) time-of-flight mass cytometry (CyTOF analysis). Variants of CSF2RB were transfected into HEK293 cells and the expression and functions of gene products were compared.In the discovery cohort, we associated CD with a frameshift mutation in CSF2RB (P = 8.52 × 10(-4)); the finding was validated in the replication cohort (combined P = 3.42 × 10(-6)). Incubation of intestinal lamina propria leukocytes with granulocyte-macrophage colony-stimulating factor resulted in high levels of phosphorylation of signal transducer and activator of transcription (STAT5) and lesser increases in phosphorylation of extracellular signal-regulated kinase and AK straining transforming (AKT). Cells co-transfected with full-length and mutant forms of CSF2RB had reduced pSTAT5 after stimulation with granulocyte-macrophage colony-stimulating factor, compared with cells transfected with control CSF2RB, indicating a dominant-negative effect of the mutant gene. Monocytes from patients with CD who were heterozygous for the frameshift mutation (6% of CD cases analyzed) had reduced responses to granulocyte-macrophage colony-stimulating factor and markedly decreased activity of aldehyde dehydrogenase; activity of this enzyme has been associated with immune tolerance.In a genetic analysis of Ashkenazi Jewish individuals, we associated CD with a frameshift mutation in CSF2RB. Intestinal monocytes from carriers of this mutation had reduced responses to granulocyte-macrophage colony-stimulating factor, providing an additional mechanism for alterations to the innate immune response in individuals with CD.
View details for DOI 10.1053/j.gastro.2016.06.045
View details for PubMedID 27377463
-
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
PloS one
2016; 11 (4): e0153803
Abstract
It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit.Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT).We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed.We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.
View details for DOI 10.1371/journal.pone.0153803
View details for PubMedID 27128313
View details for PubMedCentralID PMC4851355
-
A null mutation in ANGPTL8 does not associate with either plasma glucose or type 2 diabetes in humans
BMC ENDOCRINE DISORDERS
2016; 16
Abstract
Experiments in mice initially suggested a role for the protein angiopoietin-like 8 (ANGPTL8) in glucose homeostasis. However, subsequent experiments in model systems have challenged this proposed role. We sought to better understand the importance of ANGPTL8 in human glucose homeostasis by examining the association of a null mutation in ANGPTL8 with fasting glucose levels and risk for type 2 diabetes.A naturally-occurring null mutation in human ANGPTL8 (rs145464906; c.361C > T; p.Q121X) is carried by ~1 in 1000 individuals of European ancestry and is associated with higher levels of plasma high-density lipoprotein cholesterol, suggesting that this mutation has functional significance. We examined the association of p.Q121X with fasting glucose levels and risk for type 2 diabetes in up to 95,558 individuals (14,824 type 2 diabetics and 80,734 controls).We found no significant association of p.Q121X with either fasting glucose or type 2 diabetes (p-value = 0.90 and 0.65, respectively). Given our sample sizes, we had >98 % power to detect at least a 0.23 mmol/L effect on plasma glucose and >95 % power to detect a 70 % increase in risk for type 2 diabetes.Disruption of ANGPTL8 function in humans does not seem to have a large effect on measures of glucose tolerance.
View details for DOI 10.1186/s12902-016-0088-8
View details for Web of Science ID 000369216000001
View details for PubMedID 26822414
View details for PubMedCentralID PMC4730725
-
The landscape of genomic imprinting across diverse adult human tissues
GENOME RESEARCH
2015; 25 (7): 927-936
Abstract
Genomic imprinting is an important regulatory mechanism that silences one of the parental copies of a gene. To systematically characterize this phenomenon, we analyze tissue specificity of imprinting from allelic expression data in 1582 primary tissue samples from 178 individuals from the Genotype-Tissue Expression (GTEx) project. We characterize imprinting in 42 genes, including both novel and previously identified genes. Tissue specificity of imprinting is widespread, and gender-specific effects are revealed in a small number of genes in muscle with stronger imprinting in males. IGF2 shows maternal expression in the brain instead of the canonical paternal expression elsewhere. Imprinting appears to have only a subtle impact on tissue-specific expression levels, with genes lacking a systematic expression difference between tissues with imprinted and biallelic expression. In summary, our systematic characterization of imprinting in adult tissues highlights variation in imprinting between genes, individuals, and tissues.
View details for DOI 10.1101/gr.192278.115
View details for Web of Science ID 000357356900001
View details for PubMedID 25953952
View details for PubMedCentralID PMC4484390
-
The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans
SCIENCE
2015; 348 (6235): 648-660
Abstract
Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.
View details for DOI 10.1126/science.1262110
View details for Web of Science ID 000354045700036
View details for PubMedCentralID PMC4547484
-
Sharing and Specificity of Co-expression Networks across 35 Human Tissues.
PLoS computational biology
2015; 11 (5)
Abstract
To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool, available at mostafavilab.stat.ubc.ca/GNAT, which allows exploration of gene function and regulation in a tissue-specific manner.
View details for DOI 10.1371/journal.pcbi.1004220
View details for PubMedID 25970446
View details for PubMedCentralID PMC4430528
-
Whole-genome sequencing to understand the genetic architecture of common gene expression and biomarker phenotypes
HUMAN MOLECULAR GENETICS
2015; 24 (5): 1504-1512
Abstract
Initial results from sequencing studies suggest that there are relatively few low-frequency (<5%) variants associated with large effects on common phenotypes. We performed low-pass whole-genome sequencing in 680 individuals from the InCHIANTI study to test two primary hypotheses: (i) that sequencing would detect single low-frequency-large effect variants that explained similar amounts of phenotypic variance as single common variants, and (ii) that some common variant associations could be explained by low-frequency variants. We tested two sets of disease-related common phenotypes for which we had statistical power to detect large numbers of common variant-common phenotype associations-11 132 cis-gene expression traits in 450 individuals and 93 circulating biomarkers in all 680 individuals. From a total of 11 657 229 high-quality variants of which 6 129 221 and 5 528 008 were common and low frequency (<5%), respectively, low frequency-large effect associations comprised 7% of detectable cis-gene expression traits [89 of 1314 cis-eQTLs at P < 1 × 10(-06) (false discovery rate ∼5%)] and one of eight biomarker associations at P < 8 × 10(-10). Very few (30 of 1232; 2%) common variant associations were fully explained by low-frequency variants. Our data show that whole-genome sequencing can identify low-frequency variants undetected by genotyping based approaches when sample sizes are sufficiently large to detect substantial numbers of common variant associations, and that common variant associations are rarely explained by single low-frequency variants of large effect.
View details for DOI 10.1093/hmg/ddu560
View details for Web of Science ID 000350142800025
View details for PubMedID 25378555
-
Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction.
Nature
2015; 518 (7537): 102-106
Abstract
Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
View details for DOI 10.1038/nature13917
View details for PubMedID 25487149
-
Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction.
Nature
2015; 518 (7537): 102-106
Abstract
Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
View details for DOI 10.1038/nature13917
View details for PubMedID 25487149
-
Identification and Functional Characterization of G6PC2 Coding Variants Influencing Glycemic Traits Define an Effector Transcript at the G6PC2-ABCB11 Locus
PLOS GENETICS
2015; 11 (1)
Abstract
Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights.
View details for DOI 10.1371/journal.pgen.1004876
View details for Web of Science ID 000349314600012
View details for PubMedID 25625282
View details for PubMedCentralID PMC4307976
-
Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol.
American journal of human genetics
2014; 94 (2): 233-245
Abstract
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.
View details for DOI 10.1016/j.ajhg.2014.01.010
View details for PubMedID 24507775
-
Transcriptome and genome sequencing uncovers functional variation in humans
NATURE
2013; 501 (7468): 506-511
View details for DOI 10.1038/nature12531
View details for Web of Science ID 000324826300049
-
Association Between Variants of PRDM1 and NDP52 and Crohn's Disease, Based on Exome Sequencing and Functional Studies
GASTROENTEROLOGY
2013; 145 (2): 339-347
Abstract
Genome-wide association studies (GWAS) have identified 140 Crohn's disease (CD) susceptibility loci. For most loci, the variants that cause disease are not known and the genes affected by these variants have not been identified. We aimed to identify variants that cause CD through detailed sequencing, genetic association, expression, and functional studies.We sequenced whole exomes of 42 unrelated subjects with CD and 5 healthy subjects (controls) and then filtered single nucleotide variants by incorporating association results from meta-analyses of CD GWAS and in silico mutation effect prediction algorithms. We then genotyped 9348 subjects with CD, 2868 subjects with ulcerative colitis, and 14,567 control subjects and associated variants analyzed in functional studies using materials from subjects and controls and in vitro model systems.We identified rare missense mutations in PR domain-containing 1 (PRDM1) and associated these with CD. These mutations increased proliferation of T cells and secretion of cytokines on activation and increased expression of the adhesion molecule L-selectin. A common CD risk allele, identified in GWAS, correlated with reduced expression of PRDM1 in ileal biopsy specimens and peripheral blood mononuclear cells (combined P = 1.6 × 10(-8)). We identified an association between CD and a common missense variant, Val248Ala, in nuclear domain 10 protein 52 (NDP52) (P = 4.83 × 10(-9)). We found that this variant impairs the regulatory functions of NDP52 to inhibit nuclear factor κB activation of genes that regulate inflammation and affect the stability of proteins in Toll-like receptor pathways.We have extended the results of GWAS and provide evidence that variants in PRDM1 and NDP52 determine susceptibility to CD. PRDM1 maps adjacent to a CD interval identified in GWAS and encodes a transcription factor expressed by T and B cells. NDP52 is an adaptor protein that functions in selective autophagy of intracellular bacteria and signaling molecules, supporting the role of autophagy in the pathogenesis of CD.
View details for DOI 10.1053/j.gastro.2013.04.040
View details for Web of Science ID 000322630600023
View details for PubMedID 23624108
-
Mosaic PPM1D mutations are associated with predisposition to breast and ovarian cancer.
Nature
2013; 493 (7432): 406-410
Abstract
Improved sequencing technologies offer unprecedented opportunities for investigating the role of rare genetic variation in common disease. However, there are considerable challenges with respect to study design, data analysis and replication. Using pooled next-generation sequencing of 507 genes implicated in the repair of DNA in 1,150 samples, an analytical strategy focused on protein-truncating variants (PTVs) and a large-scale sequencing case-control replication experiment in 13,642 individuals, here we show that rare PTVs in the p53-inducible protein phosphatase PPM1D are associated with predisposition to breast cancer and ovarian cancer. PPM1D PTV mutations were present in 25 out of 7,781 cases versus 1 out of 5,861 controls (P = 1.12 × 10(-5)), including 18 mutations in 6,912 individuals with breast cancer (P = 2.42 × 10(-4)) and 12 mutations in 1,121 individuals with ovarian cancer (P = 3.10 × 10(-9)). Notably, all of the identified PPM1D PTVs were mosaic in lymphocyte DNA and clustered within a 370-base-pair region in the final exon of the gene, carboxy-terminal to the phosphatase catalytic domain. Functional studies demonstrate that the mutations result in enhanced suppression of p53 in response to ionizing radiation exposure, suggesting that the mutant alleles encode hyperactive PPM1D isoforms. Thus, although the mutations cause premature protein truncation, they do not result in the simple loss-of-function effect typically associated with this class of variant, but instead probably have a gain-of-function effect. Our results have implications for the detection and management of breast and ovarian cancer risk. More generally, these data provide new insights into the role of rare and of mosaic genetic variants in common conditions, and the use of sequencing in their identification.
View details for DOI 10.1038/nature11725
View details for PubMedID 23242139
-
Mosaic PPM1D mutations are associated with predisposition to breast and ovarian cancer
NATURE
2013; 493 (7432): 406-U152
View details for DOI 10.1038/nature11725
View details for Web of Science ID 000313615900052
-
Rare, Low-Frequency, and Common Variants in the Protein-Coding Sequence of Biological Candidate Genes from GWASs Contribute to Risk of Rheumatoid Arthritis
AMERICAN JOURNAL OF HUMAN GENETICS
2013; 92 (1): 15-27
Abstract
The extent to which variants in the protein-coding sequence of genes contribute to risk of rheumatoid arthritis (RA) is unknown. In this study, we addressed this issue by deep exon sequencing and large-scale genotyping of 25 biological candidate genes located within RA risk loci discovered by genome-wide association studies (GWASs). First, we assessed the contribution of rare coding variants in the 25 genes to the risk of RA in a pooled sequencing study of 500 RA cases and 650 controls of European ancestry. We observed an accumulation of rare nonsynonymous variants exclusive to RA cases in IL2RA and IL2RB (burden test: p = 0.007 and p = 0.018, respectively). Next, we assessed the aggregate contribution of low-frequency and common coding variants to the risk of RA by dense genotyping of the 25 gene loci in 10,609 RA cases and 35,605 controls. We observed a strong enrichment of coding variants with a nominal signal of association with RA (p < 0.05) after adjusting for the best signal of association at the loci (p(enrichment) = 6.4 × 10(-4)). For one locus containing CD2, we found that a missense variant, rs699738 (c.798C>A [p.His266Gln]), and a noncoding variant, rs624988, reside on distinct haplotypes and independently contribute to the risk of RA (p = 4.6 × 10(-6)). Overall, our results indicate that variants (distributed across the allele-frequency spectrum) within the protein-coding portion of a subset of biological candidate genes identified by GWASs contribute to the risk of RA. Further, we have demonstrated that very large sample sizes will be required for comprehensively identifying the independent alleles contributing to the missing heritability of RA.
View details for DOI 10.1016/j.ajhg.2012.11.012
View details for Web of Science ID 000313759000002
View details for PubMedID 23261300
-
Pooled DNA Resequencing of 68 Myocardial Infarction Candidate Genes in French Canadians
CIRCULATION-CARDIOVASCULAR GENETICS
2012; 5 (5): 547-554
Abstract
Familial history is a strong risk factor for coronary artery disease (CAD), especially for early-onset myocardial infarction (MI). Several genes and chromosomal regions have been implicated in the genetic cause of coronary artery disease/MI, mostly through the discovery of familial mutations implicated in hyper-/hypocholesterolemia by linkage studies and single nucleotide polymorphisms by genome-wide association studies. Except for a few examples (eg, PCSK9), the role of low-frequency genetic variation (minor allele frequency [MAF]) ≈0.1%-5% on MI/coronary artery disease predisposition has not been extensively investigated.We selected 68 candidate genes and sequenced their exons (394 kb) in 500 early-onset MI cases and 500 matched controls, all of French-Canadian ancestry, using solution-based capture in pools of nonindexed DNA samples. In these regions, we identified 1852 single nucleotide variants (695 novel) and captured 85% of the variants with MAF≥1% found by the 1000 Genomes Project in Europe-ancestry individuals. Using gene-based association testing, we prioritized for follow-up 29 low-frequency variants in 8 genes and attempted to genotype them for replication in 1594 MI cases and 2988 controls from 2 French-Canadian panels. Our pilot association analysis of low-frequency variants in 68 candidate genes did not identify genes with large effect on MI risk in French Canadians.We have optimized a strategy, applicable to all complex diseases and traits, to discover efficiently and cost-effectively DNA sequence variants in large populations. Resequencing endeavors to find low-frequency variants implicated in common human diseases are likely to require very large sample size.
View details for DOI 10.1161/CIRCGENETICS.112.963165
View details for Web of Science ID 000309886500011
View details for PubMedID 22923420
-
Genetic Adaptation of Fatty-Acid Metabolism: A Human-Specific Haplotype Increasing the Biosynthesis of Long-Chain Omega-3 and Omega-6 Fatty Acids
AMERICAN JOURNAL OF HUMAN GENETICS
2012; 90 (5): 809-820
Abstract
Omega-3 and omega-6 long-chain polyunsaturated fatty acids (LC-PUFAs) are essential for the development and function of the human brain. They can be obtained directly from food, e.g., fish, or synthesized from precursor molecules found in vegetable oils. To determine the importance of genetic variability to fatty-acid biosynthesis, we studied FADS1 and FADS2, which encode rate-limiting enzymes for fatty-acid conversion. We performed genome-wide genotyping (n = 5,652 individuals) and targeted resequencing (n = 960 individuals) of the FADS region in five European population cohorts. We also analyzed available genomic data from human populations, archaic hominins, and more distant primates. Our results show that present-day humans have two common FADS haplotypes-defined by 28 closely linked SNPs across 38.9 kb-that differ dramatically in their ability to generate LC-PUFAs. No independent effects on FADS activity were seen for rare SNPs detected by targeted resequencing. The more efficient, evolutionarily derived haplotype appeared after the lineage split leading to modern humans and Neanderthals and shows evidence of positive selection. This human-specific haplotype increases the efficiency of synthesizing essential long-chain fatty acids from precursors and thereby might have provided an advantage in environments with limited access to dietary LC-PUFAs. In the modern world, this haplotype has been associated with lifestyle-related diseases, such as coronary artery disease.
View details for DOI 10.1016/j.ajhg.2012.03.014
View details for Web of Science ID 000303907500005
View details for PubMedID 22503634
-
A map of human genome variation from population-scale sequencing
NATURE
2010; 467 (7319): 1061-1073
Abstract
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
View details for DOI 10.1038/nature09534
View details for Web of Science ID 000283548600039
View details for PubMedCentralID PMC3042601
-
High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency
NATURE GENETICS
2010; 42 (10): 851-?
Abstract
Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes that are involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing and experimental validation to uncover the molecular basis of mitochondrial complex I disorders. We created seven pools of DNA from a cohort of 103 cases and 42 healthy controls and then performed deep sequencing of 103 candidate genes to identify 151 rare variants that were predicted to affect protein function. We established genetic diagnoses in 13 of 60 previously unsolved cases using confirmatory experiments, including cDNA complementation to show that mutations in NUBPL and FOXRED1 can cause complex I deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can be used to identify causal mutations in individual cases.
View details for DOI 10.1038/ng.659
View details for Web of Science ID 000282276600014
View details for PubMedID 20818383
-
Fine Mapping in 94 Inbred Mouse Strains Using a High-Density Haplotype Resource
GENETICS
2010; 185 (3): 1081-1095
Abstract
The genetics of phenotypic variation in inbred mice has for nearly a century provided a primary weapon in the medical research arsenal. A catalog of the genetic variation among inbred mouse strains, however, is required to enable powerful positional cloning and association techniques. A recent whole-genome resequencing study of 15 inbred mouse strains captured a significant fraction of the genetic variation among a limited number of strains, yet the common use of hundreds of inbred strains in medical research motivates the need for a high-density variation map of a larger set of strains. Here we report a dense set of genotypes from 94 inbred mouse strains containing 10.77 million genotypes over 121,433 single nucleotide polymorphisms (SNPs), dispersed at 20-kb intervals on average across the genome, with an average concordance of 99.94% with previous SNP sets. Through pairwise comparisons of the strains, we identified an average of 4.70 distinct segments over 73 classical inbred strains in each region of the genome, suggesting limited genetic diversity between the strains. Combining these data with genotypes of 7570 gap-filling SNPs, we further imputed the untyped or missing genotypes of 94 strains over 8.27 million Perlegen SNPs. The imputation accuracy among classical inbred strains is estimated at 99.7% for the genotypes imputed with high confidence. We demonstrated the utility of these data in high-resolution linkage mapping through power simulations and statistical power analysis and provide guidelines for developing such studies. We also provide a resource of in silico association mapping between the complex traits deposited in the Mouse Phenome Database with our genotypes. We expect that these resources will facilitate effective designs of both human and mouse studies for dissecting the genetic basis of complex traits.
View details for DOI 10.1534/genetics.110.115014
View details for Web of Science ID 000281906800030
View details for PubMedID 20439770
View details for PubMedCentralID PMC2907194
-
Genetic Analysis of Human Traits In Vitro: Drug Response and Gene Expression in Lymphoblastoid Cell Lines
PLOS GENETICS
2008; 4 (11)
Abstract
Lymphoblastoid cell lines (LCLs), originally collected as renewable sources of DNA, are now being used as a model system to study genotype-phenotype relationships in human cells, including searches for QTLs influencing levels of individual mRNAs and responses to drugs and radiation. In the course of attempting to map genes for drug response using 269 LCLs from the International HapMap Project, we evaluated the extent to which biological noise and non-genetic confounders contribute to trait variability in LCLs. While drug responses could be technically well measured on a given day, we observed significant day-to-day variability and substantial correlation to non-genetic confounders, such as baseline growth rates and metabolic state in culture. After correcting for these confounders, we were unable to detect any QTLs with genome-wide significance for drug response. A much higher proportion of variance in mRNA levels may be attributed to non-genetic factors (intra-individual variance--i.e., biological noise, levels of the EBV virus used to transform the cells, ATP levels) than to detectable eQTLs. Finally, in an attempt to improve power, we focused analysis on those genes that had both detectable eQTLs and correlation to drug response; we were unable to detect evidence that eQTL SNPs are convincingly associated with drug response in the model. While LCLs are a promising model for pharmacogenetic experiments, biological noise and in vitro artifacts may reduce power and have the potential to create spurious association due to confounding.
View details for DOI 10.1371/journal.pgen.1000287
View details for Web of Science ID 000261481000040
View details for PubMedID 19043577
View details for PubMedCentralID PMC2583954