Chiara Sabatti
Professor of Biomedical Data Science and of Statistics
Department of Biomedical Data Science
Academic Appointments
-
Professor, Department of Biomedical Data Science
-
Professor, Statistics
-
Member, Bio-X
-
Member, Stanford Cancer Institute
-
Associate Director, Stanford Data Science
-
Member, Women in Data Science
Administrative Appointments
-
Associate Director, Data Science BS (working with the MCS major since 2012) (2022 - Present)
-
Associate Chair for Education and Training, Biomedical Data Science (2020 - Present)
-
Associate Director, Stanford Data Science (2018 - Present)
-
Vice chair, Biomedical Data Science (2018 - 2019)
Honors & Awards
-
Fellow, Institute of Mathematical Statistics (2022)
-
CAREER, NSF (2003-08)
Professional Education
-
PostDoctoral, Stanford, Genetics (2000)
-
Ph D, Stanford, Statistics (1998)
-
BS & MS, Bocconi University, Statistics and Economics (1993)
Current Research and Scholarly Interests
Statistical models and reasoning are key to our understanding of the genetic basis of human traits. Modern high-throughput technology presents us with new opportunities and challenges. We develop statistical approaches for high dimensional data in the attempt of improving our understanding of the molecular basis of health related traits.
Clinical Trials
-
Perfusion CT Monitoring to Predict Treatment Efficacy in Renal Cell Carcinoma
Not Recruiting
This pilot clinical trial studies perfusion computed tomography (CT) in predicting response to treatment in patients with advanced kidney cancer. Comparing results of diagnostic procedures done before, during, and after targeted therapy may help doctors predict a patient's response to treatment and help plan the best treatment.
Stanford is currently not accepting patients for this trial. For more information, please contact Yoriko Imae, 650-498-5186.
2024-25 Courses
- Consulting Workshop on Biomedical Data Science
BIODS 232 (Aut, Win, Spr) - Data Narratives
DATASCI 120, MCS 120 (Spr) - Inclusive Mentorship in Data Science
BIODS 360, BIOMEDIN 360 (Win) - The Data Science Experience
DATASCI 190 (Spr) -
Independent Studies (8)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIODS 299 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum) - Independent Study
DATASCI 199 (Aut, Win, Spr, Sum) - Independent Study
STATS 199 (Aut, Win, Spr, Sum) - Independent Study
STATS 299 (Aut, Win, Spr, Sum) - Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum) - Research
STATS 399 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
-
Prior Year Courses
2023-24 Courses
- Consulting Workshop on Biomedical Data Science
BIODS 232 (Aut, Win, Spr) - Critical Exploration of Topics in Biomedical Data Science: Generative AI
BIODS 290 (Aut) - Data Narratives
DATASCI 120, MCS 120 (Spr) - Inclusive Mentorship in Data Science
BIODS 360, BIOMEDIN 360 (Win) - The Data Science Experience
DATASCI 190 (Spr)
2022-23 Courses
- Consulting Workshop on Biomedical Data Science
BIODS 232 (Aut, Win, Spr) - Data Narratives
DATASCI 120, MCS 120 (Spr) - Inclusive Mentorship in Data Science
BIODS 360, BIOMEDIN 360 (Win)
2021-22 Courses
- Consulting Workshop on Biomedical Data Science
BIODS 232 (Aut, Win, Spr) - Data Narratives
MCS 120 (Spr) - Inclusive Mentorship in Data Science
BIODS 360, BIOMEDIN 360 (Win)
- Consulting Workshop on Biomedical Data Science
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Zhaomeng Chen, Julie Zhang -
Postdoctoral Faculty Sponsor
Benjamin Chu -
Doctoral Dissertation Advisor (AC)
Paula Gablenz -
Master's Program Advisor
Misha Baitemirova -
Undergraduate Major Advisor
Gowri Vadmal -
Doctoral (Program)
Max Schuessler
Graduate and Fellowship Programs
-
Biomedical Data Science (Phd Program)
-
Biomedical Data Science (Masters Program)
All Publications
-
In silico identification of putative causal genetic variants.
bioRxiv : the preprint server for biology
2024
Abstract
Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Despite the widespread availability of genome-wide data, existing methods to analyze genetic data still primarily focus on marginal association models, which fall short of fully capturing the polygenic nature of complex traits and elucidating biological causal mechanisms. Here we present a computationally efficient causal inference framework for genome-wide detection of putative causal variants underlying genetic associations. Our approach utilizes summary statistics from potentially overlapping studies as input, constructs in silico knockoff copies of summary statistics as negative controls to attenuate confounding effects induced by linkage disequilibrium, and employs efficient ultrahigh-dimensional sparse regression to jointly model all genetic variants across the genome. Our method is computationally efficient, requiring less than 15 minutes on a single CPU to analyze genome-wide summary statistics. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer's disease (AD) we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline via marginal association testing. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments. Additionally, we applied the method to a retrospective analysis of large-scale genome-wide association studies (GWAS) summary statistics from 2013 to 2022. Results reveal the method's capacity to robustly discover additional loci for polygenic traits beyond conventional GWAS and pinpoint potential causal variants underpinning each locus (on average, 22.7% more loci and 78.7% fewer proxy variants), contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses. We are making the discoveries and software freely available to the community and anticipate that routine end-to-end in silico identification of putative causal genetic variants will become an important tool that will facilitate downstream functional experiments and future research into disease etiology, as well as the exploration of novel therapeutic avenues.
View details for DOI 10.1101/2024.02.28.582621
View details for PubMedID 38464202
-
Geospatial investigations in Colombia reveal variations in the distribution of mood and psychotic disorders.
Communications medicine
2024; 4 (1): 26
Abstract
BACKGROUND: Geographical variations in mood and psychotic disorders have been found in upper-income countries. We looked for geographic variation in these disorders in Colombia, a middle-income country. We analyzed electronic health records from the Clinica San Juan de Dios Manizales (CSJDM), which provides comprehensive mental healthcare for the one million inhabitants of Caldas.METHODS: We constructed a friction surface map of Caldas and used it to calculate the travel-time to the CSJDM for 16,295 patients who had received an initial diagnosis of mood or psychotic disorder. Using a zero-inflated negative binomial regression model, we determined the relationship between travel-time and incidence, stratified by disease severity. We employed spatial scan statistics to look for patient clusters.RESULTS: We show that travel-times (for driving) to the CSJDM are less than 1h for ~50% of the population and more than 4h for ~10%. We find a distance-decay relationship for outpatients, but not for inpatients: for every hour increase in travel-time, the number of expected outpatient cases decreases by 20% (RR=0.80, 95% confidence interval [0.71, 0.89], p=5.67E-05). We find nine clusters/hotspots of inpatients.CONCLUSIONS: Our results reveal inequities in access to healthcare: many individuals requiring only outpatient treatment may live too far from the CSJDM to access healthcare. Targeting of resources to comprehensively identify severely ill individuals living in the observed hotspots could further address treatment inequities and enable investigations to determine factors generating these hotspots.
View details for DOI 10.1038/s43856-024-00441-x
View details for PubMedID 38383761
-
Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression.
ArXiv
2024
Abstract
Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.
View details for PubMedID 38463500
-
Filtering the rejection set while preserving false discovery rate control.
Journal of the American Statistical Association
2023; 118 (541): 165-176
Abstract
Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies among rejections that hinder interpretability. This leads to the practice of filtering rejection sets obtained from multiple testing procedures, which may in turn invalidate their inferential guarantees. We propose Focused BH, a simple, flexible, and principled methodology to adjust for the application of any pre-specified filter. We prove that Focused BH controls the false discovery rate under various conditions, including when the filter satisfies an intuitive monotonicity property and the p-values are positively dependent. We demonstrate in simulations that Focused BH performs well across a variety of settings, and illustrate this method's practical utility via analyses of real datasets based on ICD and GO.
View details for DOI 10.1080/01621459.2021.1920958
View details for PubMedID 37346227
View details for PubMedCentralID PMC10281705
-
GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies.
Nature communications
2022; 13 (1): 7209
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
View details for DOI 10.1038/s41467-022-34932-z
View details for PubMedID 36418338
-
Transfer Learning in Genome-Wide Association Studies with Knockoffs
SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS
2022
View details for DOI 10.1007/s13571-022-00297-y
View details for Web of Science ID 000883204000001
-
GENETICS OF SEVERE MENTAL ILLNESS IN SOUTH AMERICA
ELSEVIER. 2022: E25
View details for DOI 10.1016/j.euroneuro.2022.07.057
View details for Web of Science ID 000898544600023
-
Searching for robust associations with a multi-environment knockoff filter.
Biometrika
2022; 109 (3): 611-629
Abstract
This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is widely applicable, this paper highlights its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.
View details for DOI 10.1093/biomet/asab055
View details for PubMedID 38633763
View details for PubMedCentralID PMC11022501
-
Searching for robust associations with a multi-environment knockoff filter
BIOMETRIKA
2022; 109 (3): 611-629
View details for DOI 10.1093/biomet/asab055
View details for Web of Science ID 000844406300006
-
DETECTING MULTIPLE REPLICATING SIGNALS USING ADAPTIVE FILTERING PROCEDURES
ANNALS OF STATISTICS
2022; 50 (4): 1890-1909
View details for DOI 10.1214/21-AOS2139
View details for Web of Science ID 000847855400002
-
Data Science in a Time of Crisis: Lessons from the Pandemic
STATISTICAL SCIENCE
2022; 37 (2): 160-161
View details for DOI 10.1214/22-STS372IN
View details for Web of Science ID 000798149000002
-
Hypotheses on a tree: new error rates and testing strategies
CLINICAL INFECTIOUS DISEASES
2021; 73 (11): 575-590
View details for DOI 10.1093/biomet/asaa086
View details for Web of Science ID 000732348300007
-
False discovery rate control in genome-wide association studies with population structure
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2021; 118 (40)
View details for DOI 10.1073/pnas.2105841118|1of12
View details for Web of Science ID 000705930300022
-
Increased activation product of complement 4 protein in plasma of individuals with schizophrenia.
Translational psychiatry
2021; 11 (1): 486
Abstract
Structural variation in the complement 4 gene (C4) confers genetic risk for schizophrenia. The variation includes numbers of the increased C4A copy number, which predicts increased C4A mRNA expression. C4-anaphylatoxin (C4-ana) is a C4 protein fragment released upon C4 protein activation that has the potential to change the blood-brain barrier (BBB). We hypothesized that elevated plasma levels of C4-ana occur in individuals with schizophrenia (iSCZ). Blood was collected from 15 iSCZ with illness duration < 5 years and from 14 healthy controls (HC). Plasma C4-ana was measured by radioimmunoassay. Other complement activation products C3-ana, C5-ana, and terminal complement complex (TCC) were also measured. Digital-droplet PCR was used to determine C4 gene structural variation state. Recombinant C4-ana was added to primary brain endothelial cells (BEC) and permeability was measured in vitro. C4-ana concentration was elevated in plasma from iSCZ compared to HC (mean=654±16ng/mL, 557±94 respectively, p=0.01). The patients also carried more copies of the C4AL gene and demonstrated a positive correlation between plasma C4-ana concentrations and C4A gene copy number. Furthermore, C4-ana increased the permeability of a monolayer of BEC in vitro. Our findings are consistent with a specific role for C4A protein in schizophrenia and raise the possibility that its activation product, C4-ana, increases BBB permeability. Exploratory analyses suggest the novel hypothesis that the relationship between C4-ana levels and C4A gene copy number could also be altered in iSCZ, suggesting an interaction with unknown genetic and/or environmental risk factors.
View details for DOI 10.1038/s41398-021-01583-5
View details for PubMedID 34552056
-
Hypotheses on a tree: new error rates and testing strategies.
Biometrika
2021; 108 (3): 575-590
Abstract
We introduce a multiple testing procedure that controls global error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses that are organized hierarchically in a tree structure. We describe a fast algorithm and prove that it controls relevant error rates given certain assumptions on the dependence between the p-values. Through simulations, we demonstrate that the proposed procedure provides the desired guarantees under a range of dependency structures and that it has the potential to gain power over alternative methods. Finally, we apply the method to studies on the genetic regulation of gene expression across multiple tissues and on the relation between the gut microbiome and colorectal cancer.
View details for DOI 10.1093/biomet/asaa086
View details for PubMedID 36825068
View details for PubMedCentralID PMC9945647
-
Revealing enzyme functional architecture via high-throughput microfluidic enzyme kinetics.
Science (New York, N.Y.)
2021; 373 (6553)
Abstract
Systematic and extensive investigation of enzymes is needed to understand their extraordinary efficiency and meet current challenges in medicine and engineering. We present HT-MEK (High-Throughput Microfluidic Enzyme Kinetics), a microfluidic platform for high-throughput expression, purification, and characterization of more than 1500 enzyme variants per experiment. For 1036 mutants of the alkaline phosphatase PafA (phosphate-irrepressible alkaline phosphatase of Flavobacterium), we performed more than 670,000 reactions and determined more than 5000 kinetic and physical constants for multiple substrates and inhibitors. We uncovered extensive kinetic partitioning to a misfolded state and isolated catalytic effects, revealing spatially contiguous regions of residues linked to particular aspects of function. Regions included active-site proximal residues but extended to the enzyme surface, providing a map of underlying architecture not possible to derive from existing approaches. HT-MEK has applications that range from understanding molecular mechanisms to medicine, engineering, and design.
View details for DOI 10.1126/science.abf8761
View details for PubMedID 34437092
View details for PubMedCentralID PMC8454890
-
Revealing enzyme functional architecture via high-throughput microfluidic enzyme kinetics
SCIENCE
2021; 373 (6553): 411-+
View details for DOI 10.1126/science.abf8761
View details for Web of Science ID 000679232100028
-
Filtering the Rejection Set While Preserving False Discovery Rate Control
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2021
View details for DOI 10.1080/01621459.2021.1920958
View details for Web of Science ID 000656734700001
-
Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease.
Cell
2021
Abstract
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
View details for DOI 10.1016/j.cell.2021.03.050
View details for PubMedID 33864768
-
False discovery rate control in genome-wide association studies with population structure.
Proceedings of the National Academy of Sciences of the United States of America
2021; 118 (40)
Abstract
We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows the generation of imperfect copies (knockoffs) of these variables that serve as ideal negative controls, correcting for linkage disequilibrium and accounting for unknown population structure, which may be due to diverse ancestries or familial relatedness. The validity and effectiveness of our method are demonstrated by extensive simulations and by applications to the UK Biobank data. These analyses confirm our method is powerful relative to state-of-the-art alternatives, while comparisons with other studies validate most of our discoveries. Finally, fast software is made available for researchers to analyze Biobank-scale datasets.
View details for DOI 10.1073/pnas.2105841118
View details for PubMedID 34580220
-
Discussion of the Paper "Prediction, Estimation, and Attribution" by B. Efron
INTERNATIONAL STATISTICAL REVIEW
2020; 88: S60–S63
View details for DOI 10.1111/insr.12412
View details for Web of Science ID 000603161400004
-
Progenitor identification and SARS-CoV-2 infection in human distal lung organoids.
Nature
2020
Abstract
The distal lung contains terminal bronchioles and alveoli that facilitate gas exchange. Three-dimensional in vitro human distal lung culture systems would strongly facilitate investigation of pathologies including interstitial lung disease, cancer, and SARS-CoV-2-associated COVID-19 pneumonia. We generated long-term feeder-free, chemically defined culture of distal lung progenitors as organoids derived from single adult human alveolar epithelial type II (AT2) or KRT5+ basal cells. AT2 organoids exhibited AT1 transdifferentiation potential while basal cell organoids developed lumens lined by differentiated club and ciliated cells. Single cell analysis of basal organoid KRT5+ cells revealed a distinct ITGA6+ITGB4+ mitotic population whose proliferation further segregated to a TNFRSF12Ahi subfraction comprising ~10% of KRT5+ basal cells, residing in clusters within terminal bronchioles and exhibiting enriched clonogenic organoid growth activity. Distal lung organoids were created with apical-out polarity to display ACE2 on the exposed external surface, facilitating SARS-CoV-2 infection of AT2 and basal cultures and identifying club cells as a novel target population. This long-term, feeder-free organoid culture of human distal lung, coupled with single cell analysis, identifies unsuspected basal cell functional heterogeneity and establishes a facile in vitro organoid model for human distal lung infections including COVID-19-associated pneumonia.
View details for DOI 10.1038/s41586-020-3014-1
View details for PubMedID 33238290
-
The GTEx Consortium atlas of genetic regulatory effects across human tissues
SCIENCE
2020; 369 (6509): 1318-+
View details for DOI 10.1126/science.aaz1776
View details for Web of Science ID 000569840300041
-
Genome-wide mapping of brain phenotypes in extended pedigrees with strong genetic loading for bipolar disorder.
Molecular psychiatry
2020
Abstract
Bipolar disorder is a highly heritable illness, associated with alterations of brain structure. As such, identification of genes influencing inter-individual differences in brain morphology may help elucidate the underlying pathophysiology of bipolar disorder (BP). To identify quantitative trait loci (QTL) that contribute to phenotypic variance of brain structure, structural neuroimages were acquired from family members (n=527) of extended pedigrees heavily loaded for bipolar disorder ascertained from genetically isolated populations in Latin America. Genome-wide linkage and association analysis were conducted on the subset of heritable brain traits that showed significant evidence of association with bipolar disorder (n=24) to map QTL influencing regional measures of brain volume and cortical thickness. Two chromosomal regions showed significant evidence of linkage; a QTL on chromosome 1p influencing corpus callosum volume and a region on chromosome 7p linked to cortical volume. Association analysis within the two QTLs identified three SNPs correlated with the brain measures.
View details for DOI 10.1038/s41380-020-0805-6
View details for PubMedID 32606377
-
Distinct and shared contributions of diagnosis and symptom domains to cognitive performance in severe mental illness in the Paisa population: a case-control study
LANCET PSYCHIATRY
2020; 7 (5): 411–19
View details for Web of Science ID 000529065000034
-
Discussion of the Paper "Prediction, Estimation, and Attribution" by B. Efron
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2020; 115 (530): 656–58
View details for DOI 10.1080/01621459.2020.1762618
View details for Web of Science ID 000538423300012
-
Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates.
Translational psychiatry
2020; 10 (1): 74
Abstract
Current evidence from case/control studies indicates that genetic risk for psychiatric disorders derives primarily from numerous common variants, each with a small phenotypic impact. The literature describing apparent segregation of bipolar disorder (BP) in numerous multigenerational pedigrees suggests that, in such families, large-effect inherited variants might play a greater role. To identify roles of rare and common variants on BP, we conducted genetic analyses in 26 Colombia and Costa Rica pedigrees ascertained for bipolar disorder 1 (BP1), the most severe and heritable form of BP. In these pedigrees, we performed microarray SNP genotyping of 838 individuals and high-coverage whole-genome sequencing of 449 individuals. We compared polygenic risk scores (PRS), estimated using the latest BP1 genome-wide association study (GWAS) summary statistics, between BP1 individuals and related controls. We also evaluated whether BP1 individuals had a higher burden of rare deleterious single-nucleotide variants (SNVs) and rare copy number variants (CNVs) in a set of genes related to BP1. We found that compared with unaffected relatives, BP1 individuals had higher PRS estimated from BP1 GWAS statistics (P=0.001~0.007) and displayed modest increase in burdens of rare deleterious SNVs (P=0.047) and rare CNVs (P=0.002~0.033) in genes related to BP1. We did not observe rare variants segregating in the pedigrees. These results suggest that small-to-moderate effect rare and common variants are more likely to contribute to BP1 risk in these extended pedigrees than a few large-effect rare variants.
View details for DOI 10.1038/s41398-020-0758-1
View details for PubMedID 32094344
-
Multi-resolution localization of causal variants across the genome.
Nature communications
2020; 11 (1): 1093
Abstract
In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report on KnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we apply KnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings.
View details for DOI 10.1038/s41467-020-14791-2
View details for PubMedID 32107378
-
Publisher Correction: Multi-resolution localization of causal variants across the genome.
Nature communications
2020; 11 (1): 1799
Abstract
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
View details for DOI 10.1038/s41467-020-15690-2
View details for PubMedID 32265451
-
A Quantitative Proteome Map of the Human Body.
Cell
2020
Abstract
Determining protein levels in each tissue and how they compare with RNA levels is important for understanding human biology and disease as well as regulatory processes that control protein levels. We quantified the relative protein levels from over 12,000 genes across 32 normal human tissues. Tissue-specific or tissue-enriched proteins were identified and compared to transcriptome data. Many ubiquitous transcripts are found to encode tissue-specific proteins. Discordance of RNA and protein enrichment revealed potential sites of synthesis and action of secreted proteins. The tissue-specific distribution of proteins also provides an in-depth view of complex biological events that require the interplay of multiple tissues. Most importantly, our study demonstrated that protein tissue-enrichment information can explain phenotypes of genetic diseases, which cannot be obtained by transcript information alone. Overall, our results demonstrate how understanding protein levels can provide insights into regulation, secretome, metabolism, and human diseases.
View details for DOI 10.1016/j.cell.2020.08.036
View details for PubMedID 32916130
-
Causal inference in genetic trio studies.
Proceedings of the National Academy of Sciences of the United States of America
2020
Abstract
We introduce a method to draw causal inferences-inferences immune to all possible confounding-from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.
View details for DOI 10.1073/pnas.2007743117
View details for PubMedID 32948695
-
Genetic analysis of activity, brain and behavioral associations in extended families with heavy genetic loading for bipolar disorder.
Psychological medicine
2019: 1–9
Abstract
BACKGROUND: Disturbed sleep and activity are prominent features of bipolar disorder type I (BP-I). However, the relationship of sleep and activity characteristics to brain structure and behavior in euthymic BP-I patients and their non-BP-I relatives is unknown. Additionally, underlying genetic relationships between these traits have not been investigated.METHODS: Relationships between sleep and activity phenotypes, assessed using actigraphy, with structural neuroimaging (brain) and cognitive and temperament (behavior) phenotypes were investigated in 558 euthymic individuals from multi-generational pedigrees including at least one member with BP-I. Genetic correlations between actigraphy-brain and actigraphy-behavior associations were assessed, and bivariate linkage analysis was conducted for trait pairs with evidence of shared genetic influences.RESULTS: More physical activity and longer awake time were significantly associated with increased brain volumes and cortical thickness, better performance on neurocognitive measures of long-term memory and executive function, and less extreme scores on measures of temperament (impulsivity, cyclothymia). These associations did not differ between BP-I patients and their non-BP-I relatives. For nine activity-brain or activity-behavior pairs there was evidence for shared genetic influence (genetic correlations); of these pairs, a suggestive bivariate quantitative trait locus on chromosome 7 for wake duration and verbal working memory was identified.CONCLUSIONS: Our findings indicate that increased physical activity and more adequate sleep are associated with increased brain size, better cognitive function and more stable temperament in BP-I patients and their non-BP-I relatives. Additionally, we found evidence for pleiotropy of several actigraphy-behavior and actigraphy-brain phenotypes, suggesting a shared genetic basis for these traits.
View details for DOI 10.1017/S0033291719003416
View details for PubMedID 31813409
-
Genetic regulation of gene expression and splicing during a 10-year period of human aging.
Genome biology
2019; 20 (1): 230
Abstract
BACKGROUND: Molecular and cellular changes are intrinsic to aging and age-related diseases. Prior cross-sectional studies have investigated the combined effects of age and genetics on gene expression and alternative splicing; however, there has been no long-term, longitudinal characterization of these molecular changes, especially in older age.RESULTS: We perform RNA sequencing in whole blood from the same individuals at ages 70 and 80 to quantify how gene expression, alternative splicing, and their genetic regulation are altered during this 10-year period of advanced aging at a population and individual level. We observe that individuals are more similar to their own expression profiles later in life than profiles of other individuals their own age. We identify 1291 and 294 genes differentially expressed and alternatively spliced with age, as well as 529 genes with outlying individual trajectories. Further, we observe a strong correlation of genetic effects on expression and splicing between the two ages, with a small subset of tested genes showing a reduction in genetic associations with expression and splicing in older age.CONCLUSIONS: These findings demonstrate that, although the transcriptome and its genetic regulation is mostly stable late in life, a small subset of genes is dynamic and is characterized by a reduction in genetic regulation, most likely due to increasing environmental variance with age.
View details for DOI 10.1186/s13059-019-1840-y
View details for PubMedID 31684996
-
LEVERAGING ELECTRONIC HOSPITAL RECORDS FOR PSYCHIATRIC PHENOTYPING
ELSEVIER. 2019: S40–S41
View details for DOI 10.1016/j.euroneuro.2019.07.080
View details for Web of Science ID 000488216600080
-
GENETICS OF SEVERE MENTAL ILLNESS IN A COLOMBIAN POPULATION ISOLATE
ELSEVIER. 2019: S24–S25
View details for DOI 10.1016/j.euroneuro.2019.07.049
View details for Web of Science ID 000488216600050
-
THE RELATIONSHIP BETWEEN GENOME-WIDE SIGNIFICANT GWAS LOCI AND PSYCHIATRIC PHENOTYPES IN A COLOMBIAN POPULATION ISOLATE
ELSEVIER. 2019: S39–S40
View details for DOI 10.1016/j.euroneuro.2019.07.079
View details for Web of Science ID 000488216600079
-
NLP STRATEGIES FOR ANALYZING FREE-TEXT PSYCHIATRIC ELECTRONIC HOSPITAL RECORDS
ELSEVIER. 2019: S127
View details for DOI 10.1016/j.euroneuro.2019.08.027
View details for Web of Science ID 000488216600232
-
Genetic dysregulation of gene expression and splicing during a ten-year period of human aging
NATURE PUBLISHING GROUP. 2019: 1688
View details for Web of Science ID 000489313905157
-
GENETICS OF SEVERE MENTAL ILLNESS: THE "PAISA PROJECT"
ELSEVIER. 2019: S38–S39
View details for DOI 10.1016/j.euroneuro.2019.07.077
View details for Web of Science ID 000488216600077
-
Exome sequencing of Finnish isolates enhances rare-variant association power.
Nature
2019
Abstract
Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exomesequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.
View details for DOI 10.1038/s41586-019-1457-z
View details for PubMedID 31367044
-
Selection-adjusted inference: an application to confidence intervals for cis-eQTL effect sizes.
Biostatistics (Oxford, England)
2019
Abstract
The goal of expression quantitative trait loci (eQTL) studies is to identify the genetic variants that influence the expression levels of the genes in an organism. High throughput technology has made such studies possible: in a given tissue sample, it enables us to quantify the expression levels of approximately 20 000 genes and to record the alleles present at millions of genetic polymorphisms. While obtaining this data is relatively cheap once a specimen is at hand, obtaining human tissue remains a costly endeavor: eQTL studies continue to be based on relatively small sample sizes, with this limitation particularly serious for tissues as brain, liver, etc.-often the organs of most immediate medical relevance. Given the high-dimensional nature of these datasets and the large number of hypotheses tested, the scientific community has adopted early on multiplicity adjustment procedures. These testing procedures primarily control the false discoveries rate for the identification of genetic variants with influence on the expression levels. In contrast, a problem that has not received much attention to date is that of providing estimates of the effect sizes associated with these variants, in a way that accounts for the considerable amount of selection. Yet, given the difficulty of procuring additional samples, this challenge is of practical importance. We illustrate in this work how the recently developed conditional inference approach can be deployed to obtain confidence intervals for the eQTL effect sizes with reliable coverage. The procedure we propose is based on a randomized hierarchical strategy with a 2-fold contribution: (1) it reflects the selection steps typically adopted in state of the art investigations and (2) it introduces the use of randomness instead of data-splitting to maximize the use of available data. Analysis of the GTEx Liver dataset (v6) suggests that naively obtained confidence intervals would likely not cover the true values of effect sizes and that the number of local genetic polymorphisms influencing the expression level of genes might be underestimated.
View details for DOI 10.1093/biostatistics/kxz024
View details for PubMedID 31301173
-
Genetic analyses of diverse populations improves discovery for complex traits.
Nature
2019
Abstract
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1-3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific4-10. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations11,12. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions13-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
View details for DOI 10.1038/s41586-019-1310-4
View details for PubMedID 31217584
-
Exploratory Gene Ontology Analysis with Interactive Visualization.
Scientific reports
2019; 9 (1): 7793
Abstract
The Gene Ontology (GO) is a central resource for functional-genomics research. Scientists rely on the functional annotations in the GO for hypothesis generation and couple it with high-throughput biological data to enhance interpretation of results. At the same time, the sheer number of concepts (>30,000) and relationships (>70,000) presents a challenge: it can be difficult to draw a comprehensive picture of how certain concepts of interest might relate with the rest of the ontology structure. Here we present new visualization strategies to facilitate the exploration and use of the information in the GO. We rely on novel graphical display and software architecture that allow significant interaction. To illustrate the potential of our strategies, we provide examples from high-throughput genomic analyses, including chromatin immunoprecipitation experiments and genome-wide association studies. The scientist can also use our visualizations to identify gene sets that likely experience coordinated changes in their expression and use them to simulate biologically-grounded single cell RNA sequencing data, or conduct power studies for differential gene expression studies using our built-in pipeline. Our software and documentation are available at http://aegis.stanford.edu .
View details for DOI 10.1038/s41598-019-42178-x
View details for PubMedID 31127124
-
Gene hunting with hidden Markov model knockoffs
BIOMETRIKA
2019; 106 (1): 1–18
View details for DOI 10.1093/biomet/asy033
View details for Web of Science ID 000460615100001
-
MULTILAYER KNOCKOFF FILTER: CONTROLLED VARIABLE SELECTION AT MULTIPLE RESOLUTIONS.
The annals of applied statistics
2019; 13 (1): 1-33
Abstract
We tackle the problem of selecting from among a large number of variables those that are "important" for an outcome. We consider situations where groups of variables are also of interest. For example, each variable might be a genetic polymorphism, and we might want to study how a trait depends on variability in genes, segments of DNA that typically contain multiple such polymorphisms. In this context, to discover that a variable is relevant for the outcome implies discovering that the larger entity it represents is also important. To guarantee meaningful results with high chance of replicability, we suggest controlling the rate of false discoveries for findings at the level of individual variables and at the level of groups. Building on the knockoff construction of Barber and Candès [Ann. Statist.43 (2015) 2055-2085] and the multilayer testing framework of Barber and Ramdas [J. Roy. Statist. Soc. Ser. B79 (2017) 1247-1268], we introduce the multilayer knockoff filter (MKF). We prove that MKF simultaneously controls the FDR at each resolution and use simulations to show that it incurs little power loss compared to methods that provide guarantees only for the discoveries of individual variables. We apply MKF to analyze a genetic dataset and find that it successfully reduces the number of false gene discoveries without a significant reduction in power.
View details for DOI 10.1214/18-AOAS1185
View details for PubMedID 31687060
View details for PubMedCentralID PMC6827557
-
MULTILAYER KNOCKOFF FILTER: CONTROLLED VARIABLE SELECTION AT MULTIPLE RESOLUTIONS
ANNALS OF APPLIED STATISTICS
2019; 13 (1): 1–33
View details for DOI 10.1214/18-AOAS1185
View details for Web of Science ID 000464000700001
-
Gene hunting with hidden Markov model knockoffs.
Biometrika
2019; 106 (1): 1–18
Abstract
Modern scientific studies often require the identification of a subset of explanatory variables. Several statistical methods have been developed to automate this task, and the framework of knockoffs has been proposed as a general solution for variable selection under rigorous Type I error control, without relying on strong modelling assumptions. In this paper, we extend the methodology of knockoffs to problems where the distribution of the covariates can be described by a hidden Markov model. We develop an exact and efficient algorithm to sample knockoff variables in this setting and then argue that, combined with the existing selective framework, this provides a natural and powerful tool for inference in genome-wide association studies with guaranteed false discovery rate control. We apply our method to datasets on Crohn's disease and some continuous phenotypes.
View details for PubMedID 30799875
-
Rejoinder: "Gene hunting with hidden Markov model knockoffs'
BIOMETRIKA
2019; 106 (1): 35–45
View details for DOI 10.1093/biomet/asy075
View details for Web of Science ID 000460615100006
-
Author Correction: Exome sequencing of Finnish isolates enhances rare-variant association power.
Nature
2019
Abstract
An Amendment to this paper has been published and can be accessed via a link at the top of the paper.
View details for DOI 10.1038/s41586-019-1726-x
View details for PubMedID 31686056
-
Organoid Modeling of the Tumor Immune Microenvironment.
Cell
2018; 175 (7): 1972
Abstract
Invitro cancer cultures, including three-dimensional organoids, typically contain exclusively neoplastic epithelium but require artificial reconstitution to recapitulate the tumor microenvironment (TME). The co-culture of primary tumor epithelia with endogenous, syngeneic tumor-infiltrating lymphocytes (TILs) as a cohesive unit has been particularly elusive. Here, an air-liquid interface (ALI) method propagated patient-derived organoids (PDOs) from >100 human biopsies or mouse tumors in syngeneic immunocompetent hosts as tumor epithelia with native embedded immune cells (T, B, NK, macrophages). Robust droplet-based, single-cell simultaneous determination of gene expression and immune repertoire indicated that PDO TILs accurately preserved the original tumor Tcell receptor (TCR) spectrum. Crucially, human and murine PDOs successfully modeled immune checkpoint blockade (ICB) with anti-PD-1- and/or anti-PD-L1 expanding and activating tumor antigen-specific TILs and eliciting tumor cytotoxicity. Organoid-based propagation of primary tumor epithelium en bloc with endogenous immune stroma should enable immuno-oncology investigations within the TME and facilitate personalized immunotherapy testing.
View details for PubMedID 30550791
-
Multiregion Quantification of Extracellular Signal-regulated Kinase Activity in Renal Cell Carcinoma.
European urology oncology
2018
Abstract
To personalize treatment for renal cell carcinoma (RCC), it would be ideal to confirm the activity of druggable protein pathways within individual tumors. We have developed a high-resolution nanoimmunoassay (NIA) to measure protein activity with high precision in scant specimens (eg, fine needle aspirates [FNAs]). Here, we used NIA to determine whether protein activation varied in different regions of RCC tumors. Since most RCC therapies target angiogenesis by inhibiting the vascular endothelial growth factor (VEGF) receptor, we quantified phosphorylation of extracellular signal-regulated kinase (ERK), a downstream effector of the VEGF signaling pathway. In 90 ex vivo FNA biopsies sampled from multiple regions of 38 primary clear cell RCC tumors, ERK phosphorylation differed among patients. In contrast, within individual patients, we found limited intratumoral heterogeneity of ERK phosphorylation. Our results suggest that measuring ERK in a single FNA may be representative of ERK activity in different regions of the same tumor. As diagnostic and therapeutic protein biomarkers are being sought, NIA measurements of protein signaling may increase the clinical utility of renal mass biopsy and allow for the application of precision oncology for patients with localized and advanced RCC. PATIENT SUMMARY: In this report, we applied a new approach to measure the activity of extracellular signal-regulated kinase (ERK), a key cancer signaling protein, in different areas within kidney cancers. We found that ERK activity varied between patients, but that different regions within individual kidney tumors showed similar ERK activity. This suggests that a single biopsy of renal cell carcinoma may be sufficient to measure protein signaling activity to aid in precision oncology approaches.
View details for DOI 10.1016/j.euo.2018.09.011
View details for PubMedID 31412000
-
Understanding the Hidden Complexity of Latin American Population Isolates.
American journal of human genetics
2018; 103 (5): 707–26
Abstract
Most population isolates examined to date were founded from a single ancestral population. Consequently, there is limited knowledge about the demographic history of admixed population isolates. Here we investigate genomic diversity of recently admixed population isolates from Costa Rica and Colombia and compare their diversity to a benchmark population isolate, the Finnish. These Latin American isolates originated during the 16th century from admixture between a few hundred European males and Amerindian females, with a limited contribution from African founders. We examine whole-genome sequence data from 449 individuals, ascertained as families to build mutigenerational pedigrees, with a mean sequencing depth of coverage of approximately 36*. We find that Latin American isolates have increased genetic diversity relative to the Finnish. However, there is an increase in the amount of identity by descent (IBD) segments in the Latin American isolates relative to the Finnish. The increase in IBD segments is likely a consequence of a very recent and severe population bottleneck during the founding of the admixed population isolates. Furthermore, the proportion of the genome that falls within a long run of homozygosity (ROH) in Costa Rican and Colombian individuals is significantly greater than that in the Finnish, suggesting more recent consanguinity in the Latin American isolates relative to that seen in the Finnish. Lastly, we find that recent consanguinity increased the number of deleterious variants found in the homozygous state, which is relevant if deleterious variants are recessive. Our study suggests that there is no single genetic signature of a population isolate.
View details for PubMedID 30401458
-
Facile generation of single-cell transcriptome and immune repertoire freshly isolated from clinical tumor specimens
AMER ASSOC CANCER RESEARCH. 2018
View details for DOI 10.1158/1538-7445.AM2018-5672
View details for Web of Science ID 000468819505089
-
Whole genome sequencing in psychiatric disorders: the WGSPD consortium (vol 20, pg 1661, 2017)
NATURE NEUROSCIENCE
2018; 21 (7): 1017
Abstract
In the version of this article initially published, the consortium authorship and corresponding authors were not presented correctly. In the PDF and print versions, the Whole Genome Sequencing for Psychiatric Disorders (WGSPD) consortium was missing from the author list at the beginning of the paper, where it should have appeared as the seventh author; it was present in the author list at the end of the paper, but the footnote directing readers to the Supplementary Note for a list of members was missing. In the HTML version, the consortium was listed as the last author instead of as the seventh, and the line directing readers to the Supplementary Note for a list of members appeared at the end of the paper under Author Information but not in association with the consortium name itself. Also, this line stated that both member names and affiliations could be found in the Supplementary Note; in fact, only names are given. In all versions of the paper, the corresponding author symbols were attached to A. Jeremy Willsey, Steven E. Hyman, Anjene M. Addington and Thomas Lehner; they should have been attached, respectively, to Steven E. Hyman, Anjene M. Addington, Thomas Lehner and Nelson B. Freimer. As a result of this shift, the respective contact links in the HTML version did not lead to the indicated individuals. The errors have been corrected in the HTML and PDF versions of the article.
View details for PubMedID 29549319
-
Organoid-based characterization of patient tumors and microenvironments at single cell resolution
AMER ASSOC CANCER RESEARCH. 2018
View details for DOI 10.1158/1538-7445.AM2018-987
View details for Web of Science ID 000468818902486
-
Exposure to NO2, CO, and PM2.5 is linked to regional DNA methylation differences in asthma
CLINICAL EPIGENETICS
2018; 10: 2
Abstract
DNA methylation of CpG sites on genetic loci has been linked to increased risk of asthma in children exposed to elevated ambient air pollutants (AAPs). Further identification of specific CpG sites and the pollutants that are associated with methylation of these CpG sites in immune cells could impact our understanding of asthma pathophysiology. In this study, we sought to identify some CpG sites in specific genes that could be associated with asthma regulation (Foxp3 and IL10) and to identify the different AAPs for which exposure prior to the blood draw is linked to methylation levels at these sites. We recruited subjects from Fresno, California, an area known for high levels of AAPs. Blood samples and responses to questionnaires were obtained (n = 188), and in a subset of subjects (n = 33), repeat samples were collected 2 years later. Average measures of AAPs were obtained for 1, 15, 30, 90, 180, and 365 days prior to each blood draw to estimate the short-term vs. long-term effects of the AAP exposures.Asthma was significantly associated with higher differentially methylated regions (DMRs) of the Foxp3 promoter region (p = 0.030) and the IL10 intronic region (p = 0.026). Additionally, at the 90-day time period (90 days prior to the blood draw), Foxp3 methylation was positively associated with NO2, CO, and PM2.5 exposures (p = 0.001, p = 0.001, and p = 0.012, respectively). In the subset of subjects retested 2 years later (n = 33), a positive association between AAP exposure and methylation was sustained. There was also a negative correlation between the average Foxp3 methylation of the promoter region and activated Treg levels (p = 0.039) and a positive correlation between the average IL10 methylation of region 3 of intron 4 and IL10 cytokine expression (p = 0.030).Short-term and long-term exposures to high levels of CO, NO2, and PM2.5 were associated with alterations in differentially methylated regions of Foxp3. IL10 methylation showed a similar trend. For any given individual, these changes tend to be sustained over time. In addition, asthma was associated with higher differentially methylated regions of Foxp3 and IL10.
View details for PubMedID 29317916
-
Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate
NATURE GENETICS
2017; 49 (12): 1714-+
Abstract
By analyzing multitissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalog of expression quantitative trait loci (eQTLs) in a nonhuman primate model. This catalog contains more genome-wide significant eQTLs per sample than comparable human resources and identifies sex- and age-related expression patterns. Findings include a master regulatory locus that likely has a role in immune function and a locus regulating hippocampal long noncoding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders.
View details for PubMedID 29083405
View details for PubMedCentralID PMC5714271
-
Whole genome sequencing in psychiatric disorders: the WGSPD consortium
NATURE NEUROSCIENCE
2017; 20 (12): 1661–68
View details for PubMedID 29184211
-
Controlling the Rate of GWAS False Discoveries
GENETICS
2017; 205 (1): 61-75
Abstract
With the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR-controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR-controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on prescreening to identify the level of resolution of distinct hypotheses. We show how FDR-controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single markers and multiple regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the North Finland Birth Cohort 66 cohort study.
View details for DOI 10.1534/genetics.116.193987
View details for Web of Science ID 000393677300004
View details for PubMedCentralID PMC5223524
-
Controlling the Rate of GWAS False Discoveries.
Genetics
2017; 205 (1): 61-75
Abstract
With the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR-controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR-controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on prescreening to identify the level of resolution of distinct hypotheses. We show how FDR-controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single markers and multiple regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the North Finland Birth Cohort 66 cohort study.
View details for DOI 10.1534/genetics.116.193987
View details for PubMedID 27784720
View details for PubMedCentralID PMC5223524
-
Genetic effects on gene expression across human tissues.
Nature
2017; 550 (7675): 204–13
Abstract
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
View details for PubMedID 29022597
-
TreeQTL: hierarchical error control for eQTL findings
BIOINFORMATICS
2016; 32 (16): 2556-2558
Abstract
: Commonly used multiplicity adjustments fail to control the error rate for reported findings in many expression quantitative trait loci (eQTL) studies. TreeQTL implements a hierarchical multiple testing procedure which allows control of appropriate error rates defined relative to a grouping of the eQTL hypotheses.The R package TreeQTL is available for download at http://bioinformatics.org/treeqtlsabatti@stanford.eduSupplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btw198
View details for Web of Science ID 000383682900033
View details for PubMedID 27153635
View details for PubMedCentralID PMC4978936
-
Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder
PLOS GENETICS
2016; 12 (5)
Abstract
The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information.
View details for DOI 10.1371/journal.pgen.1006046
View details for Web of Science ID 000377197100046
View details for PubMedID 27176483
View details for PubMedCentralID PMC4866754
-
Genetic contributions to circadian activity rhythm and sleep pattern phenotypes in pedigrees segregating for severe bipolar disorder
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2016; 113 (6): E754-E761
Abstract
Abnormalities in sleep and circadian rhythms are central features of bipolar disorder (BP), often persisting between episodes. We report here, to our knowledge, the first systematic analysis of circadian rhythm activity in pedigrees segregating severe BP (BP-I). By analyzing actigraphy data obtained from members of 26 Costa Rican and Colombian pedigrees [136 euthymic (i.e., interepisode) BP-I individuals and 422 non-BP-I relatives], we delineated 73 phenotypes, of which 49 demonstrated significant heritability and 13 showed significant trait-like association with BP-I. All BP-I-associated traits related to activity level, with BP-I individuals consistently demonstrating lower activity levels than their non-BP-I relatives. We analyzed all 49 heritable phenotypes using genetic linkage analysis, with special emphasis on phenotypes judged to have the strongest impact on the biology underlying BP. We identified a locus for interdaily stability of activity, at a threshold exceeding genome-wide significance, on chromosome 12pter, a region that also showed pleiotropic linkage to two additional activity phenotypes.
View details for DOI 10.1073/pnas.1513525113
View details for Web of Science ID 000369571700012
View details for PubMedID 26712028
View details for PubMedCentralID PMC4760829
-
Genetic Variant Selection: Learning Across Traits and Sites
GENETICS
2016; 202 (2): 439-?
Abstract
We consider resequencing studies of associated loci and the problem of prioritizing sequence variants for functional follow-up. Working within the multivariate linear regression framework helps us to account for the joint effects of multiple genes; and adopting a Bayesian approach leads to posterior probabilities that coherently incorporate all information about the variants' function. We describe two novel prior distributions that facilitate learning the role of each variable site by borrowing evidence across phenotypes and across mutations in the same gene. We illustrate their potential advantages with simulations and reanalyzing a data set of sequencing variants.
View details for DOI 10.1534/genetics.115.184572
View details for Web of Science ID 000371304600010
View details for PubMedID 26680660
View details for PubMedCentralID PMC4788227
-
Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies
GENETIC EPIDEMIOLOGY
2016; 40 (1): 45-56
View details for DOI 10.1002/gepi.21942
View details for PubMedID 26626037
-
SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION.
The annals of applied statistics
2015; 9 (3): 1103-1140
Abstract
We introduce a new estimator for the vector of coefficients β in the linear model y = Xβ + z, where X has dimensions n × p with p possibly larger than n. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to [Formula: see text]where λ1 ≥ λ2 ≥ … ≥ λ p ≥ 0 and [Formula: see text] are the decreasing absolute values of the entries of b. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical ℓ1 procedures such as the Lasso. Here, the regularizer is a sorted ℓ1 norm, which penalizes the regression coefficients according to their rank: the higher the rank-that is, stronger the signal-the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B57 (1995) 289-300] procedure (BH) which compares more significant p-values with more stringent thresholds. One notable choice of the sequence {λ i } is given by the BH critical values [Formula: see text], where q ∈ (0, 1) and z(α) is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with λBH provably controls FDR at level q. Moreover, it also appears to have appreciable inferential properties under more general designs X while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.
View details for DOI 10.1214/15-AOAS842
View details for PubMedID 26709357
View details for PubMedCentralID PMC4689150
-
SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION
ANNALS OF APPLIED STATISTICS
2015; 9 (3): 1103-1140
View details for DOI 10.1214/15-AOAS842
View details for Web of Science ID 000364340100001
-
Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.
Genetics
2015; 200 (4): 1285-1295
Abstract
Using genome-wide genotypes, we characterized the genetic structure of 103,006 participants in the Kaiser Permanente Northern California multi-ethnic Genetic Epidemiology Research on Adult Health and Aging Cohort and analyzed the relationship to self-reported race/ethnicity. Participants endorsed any of 23 race/ethnicity/nationality categories, which were collapsed into seven major race/ethnicity groups. By self-report the cohort is 80.8% white and 19.2% minority; 93.8% endorsed a single race/ethnicity group, while 6.2% endorsed two or more. Principal component (PC) and admixture analyses were generally consistent with prior studies. Approximately 17% of subjects had genetic ancestry from more than one continent, and 12% were genetically admixed, considering only nonadjacent geographical origins. Self-reported whites were spread on a continuum along the first two PCs, indicating extensive mixing among European nationalities. Self-identified East Asian nationalities correlated with genetic clustering, consistent with extensive endogamy. Individuals of mixed East Asian-European genetic ancestry were easily identified; we also observed a modest amount of European genetic ancestry in individuals self-identified as Filipinos. Self-reported African Americans and Latinos showed extensive European and African genetic ancestry, and Native American genetic ancestry for the latter. Among 3741 genetically identified parent-child pairs, 93% were concordant for self-reported race/ethnicity; among 2018 genetically identified full-sib pairs, 96% were concordant; the lower rate for parent-child pairs was largely due to intermarriage. The parent-child pairs revealed a trend toward increasing exogamy over time; the presence in the cohort of individuals endorsing multiple race/ethnicity categories creates interesting challenges and future opportunities for genetic epidemiologic studies.
View details for DOI 10.1534/genetics.115.178616
View details for PubMedID 26092716
View details for PubMedCentralID PMC4574246
-
Brain structure-function associations in multi-generational families genetically enriched for bipolar disorder
BRAIN
2015; 138: 2087-2102
Abstract
Recent theories regarding the pathophysiology of bipolar disorder suggest contributions of both neurodevelopmental and neurodegenerative processes. While structural neuroimaging studies indicate disease-associated neuroanatomical alterations, the behavioural correlates of these alterations have not been well characterized. Here, we investigated multi-generational families genetically enriched for bipolar disorder to: (i) characterize neurobehavioural correlates of neuroanatomical measures implicated in the pathophysiology of bipolar disorder; (ii) identify brain-behaviour associations that differ between diagnostic groups; (iii) identify neurocognitive traits that show evidence of accelerated ageing specifically in subjects with bipolar disorder; and (iv) identify brain-behaviour correlations that differ across the age span. Structural neuroimages and multi-dimensional assessments of temperament and neurocognition were acquired from 527 (153 bipolar disorder and 374 non-bipolar disorder) adults aged 18-87 years in 26 families with heavy genetic loading for bipolar disorder. We used linear regression models to identify significant brain-behaviour associations and test whether brain-behaviour relationships differed: (i) between diagnostic groups; and (ii) as a function of age. We found that total cortical and ventricular volume had the greatest number of significant behavioural associations, and included correlations with measures from multiple cognitive domains, particularly declarative and working memory and executive function. Cortical thickness measures, in contrast, showed more specific associations with declarative memory, letter fluency and processing speed tasks. While the majority of brain-behaviour relationships were similar across diagnostic groups, increased cortical thickness in ventrolateral prefrontal and parietal cortical regions was associated with better declarative memory only in bipolar disorder subjects, and not in non-bipolar disorder family members. Additionally, while age had a relatively strong impact on all neurocognitive traits, the effects of age on cognition did not differ between diagnostic groups. Most brain-behaviour associations were also similar across the age range, with the exception of cortical and ventricular volume and lingual gyrus thickness, which showed weak correlations with verbal fluency and inhibitory control at younger ages that increased in magnitude in older subjects, regardless of diagnosis. Findings indicate that neuroanatomical traits potentially impacted by bipolar disorder are significantly associated with multiple neurobehavioural domains. Structure-function relationships are generally preserved across diagnostic groups, with the notable exception of ventrolateral prefrontal and parietal association cortex, volumetric increases in which may be associated with cognitive resilience specifically in individuals with bipolar disorder. Although age impacted all neurobehavioural traits, we did not find any evidence of accelerated cognitive decline specific to bipolar disorder subjects. Regardless of diagnosis, greater global brain volume may represent a protective factor for the effects of ageing on executive functioning.
View details for DOI 10.1093/brain/awv106
View details for Web of Science ID 000358536600035
-
Brain structure-function associations in multi-generational families genetically enriched for bipolar disorder.
Brain : a journal of neurology
2015; 138 (Pt 7): 2087-102
Abstract
Recent theories regarding the pathophysiology of bipolar disorder suggest contributions of both neurodevelopmental and neurodegenerative processes. While structural neuroimaging studies indicate disease-associated neuroanatomical alterations, the behavioural correlates of these alterations have not been well characterized. Here, we investigated multi-generational families genetically enriched for bipolar disorder to: (i) characterize neurobehavioural correlates of neuroanatomical measures implicated in the pathophysiology of bipolar disorder; (ii) identify brain-behaviour associations that differ between diagnostic groups; (iii) identify neurocognitive traits that show evidence of accelerated ageing specifically in subjects with bipolar disorder; and (iv) identify brain-behaviour correlations that differ across the age span. Structural neuroimages and multi-dimensional assessments of temperament and neurocognition were acquired from 527 (153 bipolar disorder and 374 non-bipolar disorder) adults aged 18-87 years in 26 families with heavy genetic loading for bipolar disorder. We used linear regression models to identify significant brain-behaviour associations and test whether brain-behaviour relationships differed: (i) between diagnostic groups; and (ii) as a function of age. We found that total cortical and ventricular volume had the greatest number of significant behavioural associations, and included correlations with measures from multiple cognitive domains, particularly declarative and working memory and executive function. Cortical thickness measures, in contrast, showed more specific associations with declarative memory, letter fluency and processing speed tasks. While the majority of brain-behaviour relationships were similar across diagnostic groups, increased cortical thickness in ventrolateral prefrontal and parietal cortical regions was associated with better declarative memory only in bipolar disorder subjects, and not in non-bipolar disorder family members. Additionally, while age had a relatively strong impact on all neurocognitive traits, the effects of age on cognition did not differ between diagnostic groups. Most brain-behaviour associations were also similar across the age range, with the exception of cortical and ventricular volume and lingual gyrus thickness, which showed weak correlations with verbal fluency and inhibitory control at younger ages that increased in magnitude in older subjects, regardless of diagnosis. Findings indicate that neuroanatomical traits potentially impacted by bipolar disorder are significantly associated with multiple neurobehavioural domains. Structure-function relationships are generally preserved across diagnostic groups, with the notable exception of ventrolateral prefrontal and parietal association cortex, volumetric increases in which may be associated with cognitive resilience specifically in individuals with bipolar disorder. Although age impacted all neurobehavioural traits, we did not find any evidence of accelerated cognitive decline specific to bipolar disorder subjects. Regardless of diagnosis, greater global brain volume may represent a protective factor for the effects of ageing on executive functioning.
View details for DOI 10.1093/brain/awv106
View details for PubMedID 25943422
-
Cross-Disorder Genome-Wide Analyses Suggest a Complex Genetic Relationship Between Tourette's Syndrome and OCD
AMERICAN JOURNAL OF PSYCHIATRY
2015; 172 (1): 82-93
Abstract
Obsessive-compulsive disorder (OCD) and Tourette's syndrome are highly heritable neurodevelopmental disorders that are thought to share genetic risk factors. However, the identification of definitive susceptibility genes for these etiologically complex disorders remains elusive. The authors report a combined genome-wide association study (GWAS) of Tourette's syndrome and OCD.The authors conducted a GWAS in 2,723 cases (1,310 with OCD, 834 with Tourette's syndrome, 579 with OCD plus Tourette's syndrome/chronic tics), 5,667 ancestry-matched controls, and 290 OCD parent-child trios. GWAS summary statistics were examined for enrichment of functional variants associated with gene expression levels in brain regions. Polygenic score analyses were conducted to investigate the genetic architecture within and across the two disorders.Although no individual single-nucleotide polymorphisms (SNPs) achieved genome-wide significance, the GWAS signals were enriched for SNPs strongly associated with variations in brain gene expression levels (expression quantitative loci, or eQTLs), suggesting the presence of true functional variants that contribute to risk of these disorders. Polygenic score analyses identified a significant polygenic component for OCD (p=2×10(-4)), predicting 3.2% of the phenotypic variance in an independent data set. In contrast, Tourette's syndrome had a smaller, nonsignificant polygenic component, predicting only 0.6% of the phenotypic variance (p=0.06). No significant polygenic signal was detected across the two disorders, although the sample is likely underpowered to detect a modest shared signal. Furthermore, the OCD polygenic signal was significantly attenuated when cases with both OCD and co-occurring Tourette's syndrome/chronic tics were included in the analysis (p=0.01).Previous work has shown that Tourette's syndrome and OCD have some degree of shared genetic variation. However, the data from this study suggest that there are also distinct components to the genetic architectures of these two disorders. Furthermore, OCD with co-occurring Tourette's syndrome/chronic tics may have different underlying genetic susceptibility compared with OCD alone.
View details for DOI 10.1176/appi.ajp.2014.13101306
View details for Web of Science ID 000347146000013
View details for PubMedID 25158072
View details for PubMedCentralID PMC4282594
-
Multisystem component phenotypes of bipolar disorder for genetic investigations of extended pedigrees.
JAMA psychiatry
2014; 71 (4): 375-387
Abstract
IMPORTANCE Genetic factors contribute to risk for bipolar disorder (BP), but its pathogenesis remains poorly understood. A focus on measuring multisystem quantitative traits that may be components of BP psychopathology may enable genetic dissection of this complex disorder, and investigation of extended pedigrees from genetically isolated populations may facilitate the detection of specific genetic variants that affect BP as well as its component phenotypes. OBJECTIVE To identify quantitative neurocognitive, temperament-related, and neuroanatomical phenotypes that appear heritable and associated with severe BP (bipolar I disorder [BP-I]) and therefore suitable for genetic linkage and association studies aimed at identifying variants contributing to BP-I risk. DESIGN, SETTING, AND PARTICIPANTS Multigenerational pedigree study in 2 closely related, genetically isolated populations: the Central Valley of Costa Rica and Antioquia, Colombia. A total of 738 individuals, all from Central Valley of Costa Rica and Antioquia pedigrees, participated; among them, 181 have BP-I. MAIN OUTCOMES AND MEASURES Familial aggregation (heritability) and association with BP-I of 169 quantitative neurocognitive, temperament, magnetic resonance imaging, and diffusion tensor imaging phenotypes. RESULTS Of 169 phenotypes investigated, 126 (75%) were significantly heritable and 53 (31%) were associated with BP-I. About one-quarter of the phenotypes, including measures from each phenotype domain, were both heritable and associated with BP-I. Neuroimaging phenotypes, particularly cortical thickness in prefrontal and temporal regions as well as volume and microstructural integrity of the corpus callosum, represented the most promising candidate traits for genetic mapping related to BP based on strong heritability and association with disease. Analyses of phenotypic and genetic covariation identified substantial correlations among the traits, at least some of which share a common underlying genetic architecture. CONCLUSIONS AND RELEVANCE To our knowledge, this is the most extensive investigation of BP-relevant component phenotypes to date. Our results identify brain and behavioral quantitative traits that appear to be genetically influenced and show a pattern of BP-I association within families that is consistent with expectations from case-control studies. Together, these phenotypes provide a basis for identifying loci contributing to BP-I risk and for genetic dissection of the disorder.
View details for DOI 10.1001/jamapsychiatry.2013.4100
View details for PubMedID 24522887
-
Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci
PLOS GENETICS
2014; 10 (1)
Abstract
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20-30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5' and 3' untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
View details for DOI 10.1371/journal.pgen.1004147
View details for Web of Science ID 000336525000077
View details for PubMedID 24497850
View details for PubMedCentralID PMC3907339
-
Genome-wide association study of Tourette's syndrome.
Molecular psychiatry
2013; 18 (6): 721-728
Abstract
Tourette's syndrome (TS) is a developmental disorder that has one of the highest familial recurrence rates among neuropsychiatric diseases with complex inheritance. However, the identification of definitive TS susceptibility genes remains elusive. Here, we report the first genome-wide association study (GWAS) of TS in 1285 cases and 4964 ancestry-matched controls of European ancestry, including two European-derived population isolates, Ashkenazi Jews from North America and Israel and French Canadians from Quebec, Canada. In a primary meta-analysis of GWAS data from these European ancestry samples, no markers achieved a genome-wide threshold of significance (P<5 × 10(-8)); the top signal was found in rs7868992 on chromosome 9q32 within COL27A1 (P=1.85 × 10(-6)). A secondary analysis including an additional 211 cases and 285 controls from two closely related Latin American population isolates from the Central Valley of Costa Rica and Antioquia, Colombia also identified rs7868992 as the top signal (P=3.6 × 10(-7) for the combined sample of 1496 cases and 5249 controls following imputation with 1000 Genomes data). This study lays the groundwork for the eventual identification of common TS susceptibility variants in larger cohorts and helps to provide a more complete understanding of the full genetic architecture of this disorder.
View details for DOI 10.1038/mp.2012.69
View details for PubMedID 22889924
View details for PubMedCentralID PMC3605224
-
Increased paternal age and the influence on burden of genomic copy number variation in the general population
HUMAN GENETICS
2013; 132 (4): 443-450
Abstract
Genomic copy number variations (CNVs) and increased parental age are both associated with the risk to develop a variety of clinical neuropsychiatric disorders such as autism, schizophrenia and bipolar disorder. At the same time, it has been shown that the rate of transmitted de novo single nucleotide mutations is increased with paternal age. To address whether paternal age also affects the burden of structural genomic deletions and duplications, we examined various types of CNV burden in a large population sample from the Netherlands. Healthy participants with parental age information (n = 6,773) were collected at different University Medical Centers. CNVs were called with the PennCNV algorithm using Illumina genome-wide SNP array data. We observed no evidence in support of a paternal age effect on CNV load in the offspring. Our results were negative for global measures as well as several proxies for de novo CNV events in this unique sample. While recent studies suggest de novo single nucleotide mutation rate to be dominated by the age of the father at conception, our results strongly suggest that at the level of global CNV burden there is no influence of increased paternal age. While it remains possible that local genomic effects may exist for specific phenotypes, this study indicates that global CNV burden and increased father's age may be independent disease risk factors.
View details for DOI 10.1007/s00439-012-1261-4
View details for Web of Science ID 000316345400008
View details for PubMedID 23315237
-
Reconstructing DNA copy number by joint segmentation of multiple sequences
BMC BIOINFORMATICS
2012; 13
Abstract
Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual.We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions. GFL is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets.The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets.
View details for DOI 10.1186/1471-2105-13-205
View details for Web of Science ID 000313262200001
View details for PubMedID 22897923
View details for PubMedCentralID PMC3534631
-
A genome-wide meta-analysis of association studies of Cloninger's Temperament Scales
TRANSLATIONAL PSYCHIATRY
2012; 2
Abstract
Temperament has a strongly heritable component, yet multiple independent genome-wide studies have failed to identify significant genetic associations. We have assembled the largest sample to date of persons with genome-wide genotype data, who have been assessed with Cloninger's Temperament and Character Inventory. Sum scores for novelty seeking, harm avoidance, reward dependence and persistence have been measured in over 11,000 persons collected in four different cohorts. Our study had >80% power to identify genome-wide significant loci (P<1.25 × 10(-8), with correction for testing four scales) accounting for ≥0.4% of the phenotypic variance in temperament scales. Using meta-analysis techniques, gene-based tests and pathway analysis we have tested over 1.2 million single-nucleotide polymorphisms (SNPs) for association to each of the four temperament dimensions. We did not discover any SNPs, genes, or pathways to be significantly related to the four temperament dimensions, after correcting for multiple testing. Less than 1% of the variability in any temperament dimension appears to be accounted for by a risk score derived from the SNPs showing strongest association to the temperament dimensions. Elucidation of genetic loci significantly influencing temperament and personality will require potentially very large samples, and/or a more refined phenotype. Item response theory methodology may be a way to incorporate data from cohorts assessed with multiple personality instruments, and might be a method by which a large sample of a more refined phenotype could be acquired.
View details for DOI 10.1038/tp.2012.37
View details for Web of Science ID 000312895700009
View details for PubMedID 22832960
-
Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals.
PLoS genetics
2012; 8 (3)
Abstract
Circulating levels of adiponectin, a hormone produced predominantly by adipocytes, are highly heritable and are inversely associated with type 2 diabetes mellitus (T2D) and other metabolic traits. We conducted a meta-analysis of genome-wide association studies in 39,883 individuals of European ancestry to identify genes associated with metabolic disease. We identified 8 novel loci associated with adiponectin levels and confirmed 2 previously reported loci (P = 4.5×10(-8)-1.2×10(-43)). Using a novel method to combine data across ethnicities (N = 4,232 African Americans, N = 1,776 Asians, and N = 29,347 Europeans), we identified two additional novel loci. Expression analyses of 436 human adipocyte samples revealed that mRNA levels of 18 genes at candidate regions were associated with adiponectin concentrations after accounting for multiple testing (p<3×10(-4)). We next developed a multi-SNP genotypic risk score to test the association of adiponectin decreasing risk alleles on metabolic traits and diseases using consortia-level meta-analytic data. This risk score was associated with increased risk of T2D (p = 4.3×10(-3), n = 22,044), increased triglycerides (p = 2.6×10(-14), n = 93,440), increased waist-to-hip ratio (p = 1.8×10(-5), n = 77,167), increased glucose two hours post oral glucose tolerance testing (p = 4.4×10(-3), n = 15,234), increased fasting insulin (p = 0.015, n = 48,238), but with lower in HDL-cholesterol concentrations (p = 4.5×10(-13), n = 96,748) and decreased BMI (p = 1.4×10(-4), n = 121,335). These findings identify novel genetic determinants of adiponectin levels, which, taken together, influence risk of T2D and markers of insulin resistance.
View details for DOI 10.1371/journal.pgen.1002607
View details for PubMedID 22479202
-
Genome-Wide Analysis Shows Increased Frequency of Copy Number Variation Deletions in Dutch Schizophrenia Patients
BIOLOGICAL PSYCHIATRY
2011; 70 (7): 655-662
Abstract
Since 2008, multiple studies have reported on copy number variations (CNVs) in schizophrenia. However, many regions are unique events with minimal overlap between studies. This makes it difficult to gain a comprehensive overview of all CNVs involved in the etiology of schizophrenia. We performed a systematic CNV study on the basis of a homogeneous genome-wide dataset aiming at all CNVs ≥ 50 kilobase pair. We complemented this analysis with a review of cytogenetic and chromosomal abnormalities for schizophrenia reported in the literature with the purpose of combining classical genetic findings and our current understanding of genomic variation.We investigated 834 Dutch schizophrenia patients and 672 Dutch control subjects. The CNVs were included if they were detected by QuantiSNP (http://www.well.ox.ac.uk/QuantiSNP/) as well as PennCNV (http://www.neurogenome.org/cnv/penncnv/) and contain known protein coding genes. The integrated identification of CNV regions and cytogenetic loci indicates regions of interest (cytogenetic regions of interest [CROIs]).In total, 2437 CNVs were identified with an average number of 2.1 CNVs/subject for both cases and control subjects. We observed significantly more deletions but not duplications in schizophrenia cases versus control subjects. The CNVs identified coincide with loci previously reported in the literature, confirming well-established schizophrenia CROIs 1q42 and 22q11.2 as well as indicating a potentially novel CROI on chromosome 5q35.1.Chromosomal deletions are more prevalent in schizophrenia patients than in healthy subjects and therefore confer a risk factor for pathogenicity. The combination of our CNV data with previously reported cytogenetic abnormalities in schizophrenia provides an overview of potentially interesting regions for positional candidate genes.
View details for DOI 10.1016/j.biopsych.2011.02.015
View details for Web of Science ID 000295595800009
View details for PubMedID 21489405
-
A Molecular Screening Approach to Identify and Characterize Inhibitors of Glioblastoma Stem Cells
MOLECULAR CANCER THERAPEUTICS
2011; 10 (10): 1818-1828
Abstract
Glioblastoma (GBM) is among the most lethal of all cancers. GBM consist of a heterogeneous population of tumor cells among which a tumor-initiating and treatment-resistant subpopulation, here termed GBM stem cells, have been identified as primary therapeutic targets. Here, we describe a high-throughput small molecule screening approach that enables the identification and characterization of chemical compounds that are effective against GBM stem cells. The paradigm uses a tissue culture model to enrich for GBM stem cells derived from human GBM resections and combines a phenotype-based screen with gene target-specific screens for compound identification. We used 31,624 small molecules from 7 chemical libraries that we characterized and ranked based on their effect on a panel of GBM stem cell-enriched cultures and their effect on the expression of a module of genes whose expression negatively correlates with clinical outcome: MELK, ASPM, TOP2A, and FOXM1b. Of the 11 compounds meeting criteria for exerting differential effects across cell types used, 4 compounds showed selectivity by inhibiting multiple GBM stem cells-enriched cultures compared with nonenriched cultures: emetine, n-arachidonoyl dopamine, n-oleoyldopamine (OLDA), and n-palmitoyl dopamine. ChemBridge compounds #5560509 and #5256360 inhibited the expression of the 4 mitotic module genes. OLDA, emetine, and compounds #5560509 and #5256360 were chosen for more detailed study and inhibited GBM stem cells in self-renewal assays in vitro and in a xenograft model in vivo. These studies show that our screening strategy provides potential candidates and a blueprint for lead compound identification in larger scale screens or screens involving other cancer types.
View details for DOI 10.1158/1535-7163.MCT-11-0268
View details for Web of Science ID 000295968200006
View details for PubMedID 21859839
-
Phenotype mining in CNV carriers from a population cohort
HUMAN MOLECULAR GENETICS
2011; 20 (13): 2686-2695
Abstract
Phenotype mining is a novel approach for elucidating the genetic basis of complex phenotypic variation. It involves a search of rich phenotype databases for measures correlated with genetic variation, as identified in genome-wide genotyping or sequencing studies. An initial implementation of phenotype mining in a prospective unselected population cohort, the Northern Finland 1966 Birth Cohort (NFBC1966), identifies neurodevelopment-related traits-intellectual deficits, poor school performance and hearing abnormalities-which are more frequent among individuals with large (>500 kb) deletions than among other cohort members. Observation of extensive shared single nucleotide polymorphism haplotypes around deletions suggests an opportunity to expand phenotype mining from cohort samples to the populations from which they derive.
View details for DOI 10.1093/hmg/ddr162
View details for Web of Science ID 000291527000018
View details for PubMedID 21505072
-
RECONSTRUCTING DNA COPY NUMBER BY PENALIZED ESTIMATION AND IMPUTATION
ANNALS OF APPLIED STATISTICS
2010; 4 (4): 1749-1773
Abstract
Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genotyping platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and Wang [Biostatistics 9 (2008) 18-29]. We mount a fresh attack on this difficult optimization problem by the following: (a) changing the penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) designing and implementing a new MM (majorization-minimization) algorithm, and (c) applying a fast version of Newton's method to jointly update all model parameters. Together these changes enable us to minimize the fused-lasso criterion in a highly effective way.We also reframe the reconstruction problem in terms of imputation via discrete optimization. This approach is easier and more accurate than parameter estimation because it relies on the fact that only a handful of possible copy number states exist at each SNP. The dynamic programming framework has the added bonus of exploiting information that the current fused-lasso approach ignores. The accuracy of our imputations is comparable to that of hidden Markov models at a substantially lower computational cost.
View details for DOI 10.1214/10-AOAS357
View details for Web of Science ID 000295451000013
-
Biological, clinical and population relevance of 95 loci for blood lipids
NATURE
2010; 466 (7307): 707-713
Abstract
Plasma concentrations of total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides are among the most important risk factors for coronary artery disease (CAD) and are targets for therapeutic intervention. We screened the genome for common variants associated with plasma lipids in >100,000 individuals of European ancestry. Here we report 95 significantly associated loci (P < 5 x 10(-8)), with 59 showing genome-wide significant association with lipid traits for the first time. The newly reported associations include single nucleotide polymorphisms (SNPs) near known lipid regulators (for example, CYP7A1, NPC1L1 and SCARB1) as well as in scores of loci not previously implicated in lipoprotein metabolism. The 95 loci contribute not only to normal variation in lipid traits but also to extreme lipid phenotypes and have an impact on lipid traits in three non-European populations (East Asians, South Asians and African Americans). Our results identify several novel loci associated with plasma lipids that are also associated with CAD. Finally, we validated three of the novel genes-GALNT2, PPP1R3B and TTC39B-with experiments in mouse models. Taken together, our findings provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.
View details for DOI 10.1038/nature09270
View details for Web of Science ID 000280562500029
View details for PubMedID 20686565
-
SPARSE REGULATORY NETWORKS
ANNALS OF APPLIED STATISTICS
2010; 4 (2): 663-686
Abstract
In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses L(1) penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.
View details for DOI 10.1214/10-AOAS350
View details for Web of Science ID 000283528500007
View details for PubMedCentralID PMC3102251
-
Sparse Regulatory Networks.
The annals of applied statistics
2010; 4 (2): 663-686
Abstract
In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses L(1) penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.
View details for PubMedID 21625366
View details for PubMedCentralID PMC3102251
-
Variance component model to account for sample structure in genome-wide association studies
NATURE GENETICS
2010; 42 (4): 348-U110
Abstract
Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution; but such approaches are computationally impractical. We report here a variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours. We apply this method to two human GWAS data sets, performing association analysis for ten quantitative traits from the Northern Finland Birth Cohort and seven common diseases from the Wellcome Trust Case Control Consortium. We find that EMMAX outperforms both principal component analysis and genomic control in correcting for sample structure.
View details for DOI 10.1038/ng.548
View details for Web of Science ID 000276150500016
View details for PubMedID 20208533
-
The dysbindin-containing complex (BLOC-1) in brain: developmental regulation, interaction with SNARE proteins and role in neurite outgrowth
MOLECULAR PSYCHIATRY
2010; 15 (2): 204-215
View details for DOI 10.1038/mp.2009.58
View details for Web of Science ID 000273876000012
-
The dysbindin-containing complex (BLOC-1) in brain: developmental regulation, interaction with SNARE proteins and role in neurite outgrowth.
Molecular psychiatry
2010; 15 (2): 115-?
Abstract
Previous studies have implicated DTNBP1 as a schizophrenia susceptibility gene and its encoded protein, dysbindin, as a potential regulator of synaptic vesicle physiology. In this study, we found that endogenous levels of the dysbindin protein in the mouse brain are developmentally regulated, with higher levels observed during embryonic and early postnatal ages than in young adulthood. We obtained biochemical evidence indicating that the bulk of dysbindin from brain exists as a stable component of biogenesis of lysosome-related organelles complex-1 (BLOC-1), a multi-subunit protein complex involved in intracellular membrane trafficking and organelle biogenesis. Selective biochemical interaction between brain BLOC-1 and a few members of the SNARE (soluble N-ethylmaleimide-sensitive factor attachment protein receptor) superfamily of proteins that control membrane fusion, including SNAP-25 and syntaxin 13, was demonstrated. Furthermore, primary hippocampal neurons deficient in BLOC-1 displayed neurite outgrowth defects. Taken together, these observations suggest a novel role for the dysbindin-containing complex, BLOC-1, in neurodevelopment, and provide a framework for considering potential effects of allelic variants in DTNBP1--or in other genes encoding BLOC-1 subunits--in the context of the developmental model of schizophrenia pathogenesis.
View details for DOI 10.1038/mp.2009.58
View details for PubMedID 19546860
-
A Narrow and Highly Significant Linkage Signal for Severe Bipolar Disorder In the Chromosome 5q33 Region in Latin American Pedigrees
AMERICAN JOURNAL OF MEDICAL GENETICS PART B-NEUROPSYCHIATRIC GENETICS
2009; 150B (7): 998-1006
Abstract
We previously reported linkage of bipolar disorder to 5q33-q34 in families from two closely related population isolates, the Central Valley of Costa Rica (CVCR) and Antioquia, Colombia (CO). Here we present follow up results from fine-scale mapping in large CVCR and CO families segregating severe bipolar disorder, BP-I, and in 343 population trios/duos from CVCR and CO. Employing densely spaced SNPs to fine map the prior linkage peak region increases linkage evidence and clarifies the position of the putative BP-I locus. We performed two-point linkage analysis with 1134 SNPs in an approximately 9 Mb region between markers D5S410 and D5S422. Combining pedigrees from CVCR and CO yields a LOD score of 4.9 at SNP rs10035961. Two other SNPs (rs7721142 and rs1422795) within the same 94 kb region also displayed LOD scores greater than 4. This linkage peak coincides with our prior microsatellite results and suggests a narrowed BP-I susceptibility regions in these families. To investigate if the locus implicated in the familial form of BP-I also contributes to disease risk in the population, we followed up the family results with association analysis in duo and trio samples, obtaining signals within 2 Mb of the peak linkage signal in the pedigrees; rs12523547 and rs267015 (P = 0.00004 and 0.00016, respectively) in the CO sample and rs244960 in the CVCR sample and the combined sample, with P = 0.00032 and 0.00016, respectively. It remains unclear whether these association results reflect the same locus contributing to BP susceptibility within the extended pedigrees.
View details for DOI 10.1002/ajmg.b.30956
View details for Web of Science ID 000270441100014
View details for PubMedID 19319892
-
Robust discrimination between self and non-self neurites requires thousands of Dscam1 isoforms
NATURE
2009; 461 (7264): 644-U87
Abstract
Down Syndrome cell adhesion molecule (Dscam) genes encode neuronal cell recognition proteins of the immunoglobulin superfamily. In Drosophila, Dscam1 generates 19,008 different ectodomains by alternative splicing of three exon clusters, each encoding half or a complete variable immunoglobulin domain. Identical isoforms bind to each other, but rarely to isoforms differing at any one of the variable immunoglobulin domains. Binding between isoforms on opposing membranes promotes repulsion. Isoform diversity provides the molecular basis for neurite self-avoidance. Self-avoidance refers to the tendency of branches from the same neuron (self-branches) to selectively avoid one another. To ensure that repulsion is restricted to self-branches, different neurons express different sets of isoforms in a biased stochastic fashion. Genetic studies demonstrated that Dscam1 diversity has a profound role in wiring the fly brain. Here we show how many isoforms are required to provide an identification system that prevents non-self branches from inappropriately recognizing each other. Using homologous recombination, we generated mutant animals encoding 12, 24, 576 and 1,152 potential isoforms. Mutant animals with deletions encoding 4,752 and 14,256 isoforms were also analysed. Branching phenotypes were assessed in three classes of neurons. Branching patterns improved as the potential number of isoforms increased, and this was independent of the identity of the isoforms. Although branching defects in animals with 1,152 potential isoforms remained substantial, animals with 4,752 isoforms were indistinguishable from wild-type controls. Mathematical modelling studies were consistent with the experimental results that thousands of isoforms are necessary to ensure acquisition of unique Dscam1 identities in many neurons. We conclude that thousands of isoforms are essential to provide neurons with a robust discrimination mechanism to distinguish between self and non-self during self-avoidance.
View details for DOI 10.1038/nature08431
View details for Web of Science ID 000270302600039
View details for PubMedID 19794492
-
Disruption of the neurexin 1 gene is associated with schizophrenia
HUMAN MOLECULAR GENETICS
2009; 18 (5): 988-996
Abstract
Deletions within the neurexin 1 gene (NRXN1; 2p16.3) are associated with autism and have also been reported in two families with schizophrenia. We examined NRXN1, and the closely related NRXN2 and NRXN3 genes, for copy number variants (CNVs) in 2977 schizophrenia patients and 33 746 controls from seven European populations (Iceland, Finland, Norway, Germany, The Netherlands, Italy and UK) using microarray data. We found 66 deletions and 5 duplications in NRXN1, including a de novo deletion: 12 deletions and 2 duplications occurred in schizophrenia cases (0.47%) compared to 49 and 3 (0.15%) in controls. There was no common breakpoint and the CNVs varied from 18 to 420 kb. No CNVs were found in NRXN2 or NRXN3. We performed a Cochran-Mantel-Haenszel exact test to estimate association between all CNVs and schizophrenia (P = 0.13; OR = 1.73; 95% CI 0.81-3.50). Because the penetrance of NRXN1 CNVs may vary according to the level of functional impact on the gene, we next restricted the association analysis to CNVs that disrupt exons (0.24% of cases and 0.015% of controls). These were significantly associated with a high odds ratio (P = 0.0027; OR 8.97, 95% CI 1.8-51.9). We conclude that NRXN1 deletions affecting exons confer risk of schizophrenia.
View details for DOI 10.1093/hmg/ddn351
View details for Web of Science ID 000263409100017
View details for PubMedID 18945720
-
Genome-wide association analysis of metabolic traits in a birth cohort from a founder population
NATURE GENETICS
2009; 41 (1): 35-46
Abstract
Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene-environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
View details for DOI 10.1038/ng.271
View details for Web of Science ID 000262085300014
View details for PubMedID 19060910
-
Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts
NATURE GENETICS
2009; 41 (1): 47-55
Abstract
Recent genome-wide association (GWA) studies of lipids have been conducted in samples ascertained for other phenotypes, particularly diabetes. Here we report the first GWA analysis of loci affecting total cholesterol (TC), low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and triglycerides sampled randomly from 16 population-based cohorts and genotyped using mainly the Illumina HumanHap300-Duo platform. Our study included a total of 17,797-22,562 persons, aged 18-104 years and from geographic regions spanning from the Nordic countries to Southern Europe. We established 22 loci associated with serum lipid levels at a genome-wide significance level (P < 5 x 10(-8)), including 16 loci that were identified by previous GWA studies. The six newly identified loci in our cohort samples are ABCG5 (TC, P = 1.5 x 10(-11); LDL, P = 2.6 x 10(-10)), TMEM57 (TC, P = 5.4 x 10(-10)), CTCF-PRMT8 region (HDL, P = 8.3 x 10(-16)), DNAH11 (LDL, P = 6.1 x 10(-9)), FADS3-FADS2 (TC, P = 1.5 x 10(-10); LDL, P = 4.4 x 10(-13)) and MADD-FOLH1 region (HDL, P = 6 x 10(-11)). For three loci, effect sizes differed significantly by sex. Genetic risk scores based on lipid loci explain up to 4.8% of variation in lipids and were also associated with increased intima media thickness (P = 0.001) and coronary heart disease incidence (P = 0.04). The genetic risk score improves the screening of high-risk groups of dyslipidemia over classical risk factors.
View details for DOI 10.1038/ng.269
View details for Web of Science ID 000262085300015
View details for PubMedID 19060911
-
Markov Models for Inferring Copy Number Variations from Genotype Data on Illumina Platforms
HUMAN HEREDITY
2009; 68 (1): 1-22
Abstract
Illumina genotyping arrays provide information on DNA copy number. Current methodology for their analysis assumes linkage equilibrium across adjacent markers. This is unrealistic, given the markers high density, and can result in reduced specificity. Another limitation of current methods is that they cannot be directly applied to the analysis of multiple samples with the goal of detecting copy number polymorphisms and their association with traits of interest.We propose a new Hidden Markov Model for Illumina genotype data, that takes into account linkage disequilibrium between adjacent loci. Our framework also allows for location specific deletion/duplication rates. When multiple samples are available, we describe a methodology for their analysis that simultaneously reconstructs the copy number states in each sample and identifies genomic locations with increased variability in copy number in the population. This approach can be extended to test association between copy number variants and a disease trait.We show that taking into account linkage disequilibrium between adjacent markers can increase the specificity of a HMM in reconstructing copy number variants, especially single copy deletions. Our multisample approach is computationally practical and can increase the power of association studies.
View details for DOI 10.1159/000210445
View details for Web of Science ID 000265122300001
View details for PubMedID 19339782
-
Recurrent CNVs Disrupt Three Candidate Genes in Schizophrenia Patients
AMERICAN JOURNAL OF HUMAN GENETICS
2008; 83 (4): 504-510
Abstract
Schizophrenia is a severe psychiatric disease with complex etiology, affecting approximately 1% of the general population. Most genetics studies so far have focused on disease association with common genetic variation, such as single-nucleotide polymorphisms (SNPs), but it has recently become apparent that large-scale genomic copy-number variants (CNVs) are involved in disease development as well. To assess the role of rare CNVs in schizophrenia, we screened 54 patients with deficit schizophrenia using Affymetrix's GeneChip 250K SNP arrays. We identified 90 CNVs in total, 77 of which have been reported previously in unaffected control cohorts. Among the genes disrupted by the remaining rare CNVs are MYT1L, CTNND2, NRXN1, and ASTN2, genes that play an important role in neuronal functioning but--except for NRXN1--have not been associated with schizophrenia before. We studied the occurrence of CNVs at these four loci in an additional cohort of 752 patients and 706 normal controls from The Netherlands. We identified eight additional CNVs, of which the four that affect coding sequences were found only in the patient cohort. Our study supports a role for rare CNVs in schizophrenia susceptibility and identifies at least three candidate genes for this complex disorder.
View details for DOI 10.1016/j.ajhg.2008.09.011
View details for Web of Science ID 000260239200008
View details for PubMedID 18940311
-
Large recurrent microdeletions associated with schizophrenia
NATURE
2008; 455 (7210): 232-U61
Abstract
Reduced fecundity, associated with severe mental disorders, places negative selection pressure on risk alleles and may explain, in part, why common variants have not been found that confer risk of disorders such as autism, schizophrenia and mental retardation. Thus, rare variants may account for a larger fraction of the overall genetic risk than previously assumed. In contrast to rare single nucleotide mutations, rare copy number variations (CNVs) can be detected using genome-wide single nucleotide polymorphism arrays. This has led to the identification of CNVs associated with mental retardation and autism. In a genome-wide search for CNVs associating with schizophrenia, we used a population-based sample to identify de novo CNVs by analysing 9,878 transmissions from parents to offspring. The 66 de novo CNVs identified were tested for association in a sample of 1,433 schizophrenia cases and 33,250 controls. Three deletions at 1q21.1, 15q11.2 and 15q13.3 showing nominal association with schizophrenia in the first sample (phase I) were followed up in a second sample of 3,285 cases and 7,951 controls (phase II). All three deletions significantly associate with schizophrenia and related psychoses in the combined sample. The identification of these rare, recurrent risk variants, having occurred independently in multiple founders and being subject to negative selection, is important in itself. CNV analysis may also point the way to the identification of additional and more prevalent risk variants in genes and pathways involved in schizophrenia.
View details for DOI 10.1038/nature07229
View details for Web of Science ID 000259090800049
View details for PubMedID 18668039
-
Clinical features and associated syndromes of mal de debarquement
JOURNAL OF NEUROLOGY
2008; 255 (7): 1038-1044
Abstract
To investigate the clinical features and natural history of mal de debarquement (MdD).Retrospective case review with follow-up questionnaire and telephone interviews.University Neurotology Clinic.Patients seen between 1980 and 2006 who developed a persistent sensation of rocking or swaying for at least 3 days after exposure to passive motion.Clinical features,diagnostic testing, and questionnaire responses.Of 64 patients(75% women) identified with MdD, 34 completed follow-up questionnaires and interviews in 2006. Most patients had normal neurological exams, ENGs and brain MRIs. The average age of the first MdD episode was 39+/-13 years. A total of 206 episodes were experienced by 64 patients. Of these, 104 episodes (51%) lasted>1 month; 18%, >1 year; 15%, >2 years; 12%, >4 years, and 11%, >5 years. Eighteen patients (28%) subsequently developed spontaneous episodes of MdD-like symptoms after the initial MdD episode.There was a much higher rate of migraine in patients who went onto develop spontaneous episodes(73%) than in those who did not(22%). Subsequent episodes were longer than earlier ones in most patients who had multiple episodes.Re-exposure to passive motion temporarily decreased symptoms in most patients (66%).Subjective intolerance to visual motion increased (10% to 66%)but self-motion sensitivity did not(37% to 50%) with onset of MdD.The majority of MdD episodes lasting longer than 3 days resolve in less than one year but the probability of resolution declines each year. Many patients experience multiple MdD episodes. Some patients develop spontaneous episodes after the initial motion-triggered episode with migraine being a risk factor.
View details for DOI 10.1007/s00415-008-0837-3
View details for Web of Science ID 000258025000014
View details for PubMedID 18500497
-
Bayesian Gaussian mixture models for high-density genotyping arrays
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2008; 103 (481): 89-100
Abstract
Affymetrix's SNP (single-nucleotide polymorphism) genotyping chips have increased the scope and decreased the cost of gene-mapping studies. Because each SNP is queried by multiple DNA probes, the chips present interesting challenges in genotype calling. Traditional clustering methods distinguish the three genotypes of an SNP fairly well given a large enough sample of unrelated individuals or a training sample of known genotypes. This article describes our attempt to improve genotype calling by constructing Gaussian mixture models with empirically derived priors. The priors stabilize parameter estimation and borrow information collectively gathered on tens of thousands of SNPs. When data from related family members are available, our models capture the correlations in signals between relatives. With these advantages in mind, we apply the models to Affymetrix probe intensity data on 10,000 SNPs gathered on 63 genotyped individuals spread over eight pedigrees. We integrate the genotype-calling model with pedigree analysis and examine a sequence of symmetry hypotheses involving the correlated probe signals. The symmetry hypotheses raise novel mathematical issues of parameterization. Using the Bayesian information criterion, we select the best combination of symmetry assumptions. Compared to Affymetrix's software, our model leads to a reduction in no-calls with little sacrifice in overall calling accuracy.
View details for DOI 10.1198/016214507000000338
View details for Web of Science ID 000254311500016
-
A dictionary model for haplotyping, genotype calling, and association testing
GENETIC EPIDEMIOLOGY
2007; 31 (7): 672-683
Abstract
We propose a new method for haplotyping, genotype calling, and association testing based on a dictionary model for haplotypes. In this framework, a haplotype arises as a concatenation of conserved haplotype segments, drawn from a predefined dictionary according to segment specific probabilities. The observed data consist of unphased multimarker genotypes gathered on a random sample of unrelated individuals. These genotypes are subject to mutation, genotyping errors, and missing data. The true pair of haplotypes corresponding to a person's multimarker genotype is reconstructed using a Markov chain that visits haplotype pairs according to their posterior probabilities. Our implementation of the chain alternates Gibbs steps, which rearrange the phase of a single marker, and Metropolis steps, which swap maternal and paternal haplotypes from a given maker onward. Output of the chain include the most likely haplotype pairs, the most likely genotypes at each marker, and the expected number of occurrences of each haplotype segment. Reconstruction accuracy is comparable to that achieved by the best existing algorithms. More importantly, the dictionary model yields expected counts of conserved haplotype segments. These imputed counts can serve as genetic predictors in association studies, as we illustrate by examples on cystic fibrosis, Friedreich's ataxia, and angiotensin-I converting enzyme levels.
View details for DOI 10.1002/gepi.20232
View details for Web of Science ID 000250904800002
View details for PubMedID 17487885
-
Tag SNPs chosen from HapMap perform well in several population isolates
GENETIC EPIDEMIOLOGY
2007; 31 (3): 189-194
Abstract
Population isolates may be particularly useful for association studies of complex traits. This utility, however, largely depends on the transferability of tag SNPs chosen from reference samples, such as HapMap, to samples from such populations. Factors that characterize population isolates, such as widespread genetic drift, could impede such transferability. In this report, we show that tag SNPs chosen from HapMap perform well in several population isolates; this is true even for populations that differ substantially from the HapMap sample either in levels of linkage disequilibrium or in SNP allele frequency distributions.
View details for DOI 10.1002/gepi.20201
View details for Web of Science ID 000245128200002
View details for PubMedID 17323370
-
Human genetics - Variants in common diseases
NATURE
2007; 445 (7130): 828-830
View details for DOI 10.1038/nature05568
View details for Web of Science ID 000244341200028
View details for PubMedID 17293879
-
Genome scan for Tourette disorder in affected-sibling-pair and multigenerational families
AMERICAN JOURNAL OF HUMAN GENETICS
2007; 80 (2): 265-272
Abstract
Tourette disorder (TD) is a neuropsychiatric disorder with a complex mode of inheritance and is characterized by multiple waxing and waning motor and phonic tics. This article reports the results of the largest genetic linkage study yet undertaken for TD. The sample analyzed includes 238 nuclear families yielding 304 "independent" sibling pairs and 18 separate multigenerational families, for a total of 2,040 individuals. A whole-genome screen with the use of 390 microsatellite markers was completed. Analyses were completed using two diagnostic classifications: (1) only individuals with TD were included as affected and (2) individuals with either TD or chronic-tic (CT) disorder were included as affected. Strong evidence of linkage was observed for a region on chromosome 2p (-log P = 4.42, P = 3.8 x 10(-5) in the analyses that included individuals with TD or CT disorder as affected. Results in several other regions also provide moderate evidence (-log P >2.0) of additional susceptibility loci for TD.
View details for Web of Science ID 000243729500006
View details for PubMedID 17304708
-
The relevance of migraine in patients with Meniere's disease
ACTA OTO-LARYNGOLOGICA
2007; 127 (12): 1241-1245
Abstract
Coexistent migraine affects relevant clinical features of patients with Ménière's disease (MD).Epidemiological studies have shown an association between migraine and MD. We sought to determine whether the coexistence of migraine affects any clinical features in patients with MD.In this retrospective case-control study of University Neurotology Clinic patients, 50 patients meeting 1995 AAO-HNS criteria for definite MD were compared to 18 patients meeting the same criteria in addition to the 2004 IHS criteria for migraine (MMD). All had typical low frequency sensorineural hearing loss and episodes of rotational vertigo. Outcome measures included: sex, age of onset of episodic vertigo or fluctuating hearing loss, laterality of hearing loss, aural symptoms, caloric responses, severity of hearing loss, and family history of migraine, episodic vertigo or hearing loss.Age of onset of episodic vertigo or fluctuating hearing loss was significantly lower in patients with MMD (mean +/- 1.96*SE = 37.2 +/- 6.3 years) than in those with MD (mean +/- 1.96*SE = 49.3 +/- 4.4 years). Concurrent bilateral aural symptoms and hearing loss were seen in 56% of MMD and 4% of MD patients. A family history of episodic vertigo was seen in 39% of MMD and 2% of MD patients.
View details for DOI 10.1080/00016480701242469
View details for Web of Science ID 000251240600002
View details for PubMedID 17851970
-
Avoiding false discoveries in association studies.
Methods in molecular biology (Clifton, N.J.)
2007; 376: 195-211
Abstract
We consider the problem of controlling false discoveries in association studies. We assume that the design of the study is adequate so that the "false discoveries" are potentially only because of random chance, not to confounding or other flaws. Under this premise, we review the statistical framework for hypothesis testing and correction for multiple comparisons. We consider in detail the currently accepted strategies in linkage analysis. We then examine the underlying similarities and differences between linkage and association studies and document some of the most recent methodological developments for association mapping.
View details for PubMedID 17984547
-
Volume measures for linkage disequilibrium
BMC GENETICS
2006; 7
Abstract
Defining measures of linkage disequilibrium (LD) that have good small sample properties and are applicable to multiallelic markers poses some challenges. The potential of volume measures in this context has been noted before, but their use has been hampered by computational challenges.We design a sequential importance sampling algorithm to evaluate volume measures on I x J tables. The algorithm is implemented in a C routine as a complement to exhaustive enumeration. We make the C code available as open source. We achieve fast and accurate evaluation of volume measures in two dimensional tables.Applying our code to simulated and real datasets reinforces the belief that volume measures are a very useful tool for LD evaluation: they are not inflated in small samples, their definition encompasses multiallelic markers, and they can be computed with appreciable speed.
View details for DOI 10.1186/1471-2156-7-54
View details for Web of Science ID 000242380800001
View details for PubMedID 17112381
-
Overrepresentation of rare variants in a specific ethnic group may confuse interpretation of association analyses
HUMAN MOLECULAR GENETICS
2006; 15 (22): 3324-3328
Abstract
Rare sequence variants may be important in understanding the biology of common diseases, but clearly establishing their association with disease is often difficult. Association studies of such variants are becoming increasingly common as large-scale sequence analysis of candidate genes has become feasible. A recent report suggested SLITRK1 (Slit and Trk-like 1) as a candidate gene for Tourette Syndrome (TS). The statistical evidence for this suggestion came from association analyses of a rare 3'-UTR variant, var321, which was observed in two patients but not observed in more than 2000 controls. We genotyped 307 Costa Rican and 515 Ashkenazi individuals (TS probands and their parents) and observed var321 in five independent Ashkenazi parents, two of whom did not transmit this variant to their affected child. Furthermore, we identified var321 in one subject from an Ashkenazi control sample. Our findings do not support the previously reported association and suggest that var321 is overrepresented among Ashkenazi Jews compared with other populations of European origin. The results further suggest that overrepresentation of rare variants in a specific ethnic group may complicate the interpretation of association analyses of such variants, highlighting the particular importance of precisely matching case and control populations for association analyses of rare variants.
View details for DOI 10.1093/hmg/ddl408
View details for Web of Science ID 000241629900006
View details for PubMedID 17035247
-
Convergent linkage evidence from two Latin-American population isolates supports the presence of a susceptibility locus for bipolar disorder in 5q31-34
HUMAN MOLECULAR GENETICS
2006; 15 (21): 3146-3153
Abstract
We performed a whole genome microsatellite marker scan in six multiplex families with bipolar (BP) mood disorder ascertained in Antioquia, a historically isolated population from North West Colombia. These families were characterized clinically using the approach employed in independent ongoing studies of BP in the closely related population of the Central Valley of Costa Rica. The most consistent linkage results from parametric and non-parametric analyses of the Colombian scan involved markers on 5q31-33, a region implicated by the previous studies of BP in Costa Rica. Because of these concordant results, a follow-up study with additional markers was undertaken in an expanded set of Colombian and Costa Rican families; this provided a genome-wide significant evidence of linkage of BPI to a candidate region of approximately 10 cM in 5q31-33 (maximum non-parametric linkage score=4.395, P<0.00004). Interestingly, this region has been implicated in several previous genetic studies of schizophrenia and psychosis, including disease association with variants of the enthoprotin and gamma-aminobutyric acid receptor genes.
View details for DOI 10.1093/hmg/ddl254
View details for Web of Science ID 000241430000006
View details for PubMedID 16984960
-
Results of a SNP genome screen in a large Costa Rican pedigree segregating for severe bipolar disorder
AMERICAN JOURNAL OF MEDICAL GENETICS PART B-NEUROPSYCHIATRIC GENETICS
2006; 141B (4): 367-373
Abstract
We have ascertained in the Central Valley of Costa Rica a new kindred (CR201) segregating for severe bipolar disorder (BP-I). The family was identified by tracing genealogical connections among eight persons initially independently ascertained for a genome wide association study of BP-I. For the genome screen in CR201, we trimmed the family down to 168 persons (82 of whom are genotyped), containing 25 individuals with a best-estimate diagnosis of BP-I. A total of 4,690 SNP markers were genotyped. Analysis of the data was hampered by the size and complexity of the pedigree, which prohibited using exact multipoint methods on the entire kindred. Two-point parametric linkage analysis, using a conservative model of transmission, produced a maximum LOD score of 2.78 on chromosome 6, and a total of 39 loci with LOD scores >1.0. Multipoint parametric and non-parametric linkage analysis was performed separately on four sections of CR201, and interesting (nominal P-value from either analysis <0.01), although not statistically significant, regions were highlighted on chromosomes 1, 2, 3, 12, 16, 19, and 22, in at least one section of the pedigree, or when considering all sections together. The difficulties of analyzing genome wide SNP data for complex disorders in large, potentially informative, kindreds are discussed.
View details for DOI 10.1002/ajmg.b.30323
View details for Web of Science ID 000238054200008
View details for PubMedID 16652356
-
Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies
NATURE GENETICS
2006; 38 (5): 556-560
Abstract
The genome-wide distribution of linkage disequilibrium (LD) determines the strategy for selecting markers for association studies, but it varies between populations. We assayed LD in large samples (200 individuals) from each of 11 well-described population isolates and an outbred European-derived sample, using SNP markers spaced across chromosome 22. Most isolates show substantially higher levels of LD than the outbred sample and many fewer regions of very low LD (termed 'holes'). Young isolates known to have had relatively few founders show particularly extensive LD with very few holes; these populations offer substantial advantages for genome-wide association mapping.
View details for DOI 10.1038/ng1770
View details for Web of Science ID 000237147500017
View details for PubMedID 16582909
-
Reconstructing ancestral haplotypes with a dictionary model
JOURNAL OF COMPUTATIONAL BIOLOGY
2006; 13 (3): 767-785
Abstract
We propose a dictionary model for haplotypes. According to the model, a haplotype is constructed by randomly concatenating haplotype segments from a given dictionary of segments. A haplotype block is defined as a set of haplotype segments that begin and end with the same pair of markers. In this framework, haplotype blocks can overlap, and the model provides a setting for testing the accuracy of simpler models invoking only nonoverlapping blocks. Each haplotype segment in a dictionary has an assigned probability and alternate spellings that account for genotyping errors and mutation. The model also allows for missing data, unphased genotypes, and prior distribution of parameters. Likelihood evaluations rely on forward and backward recurrences similar to the ones encountered in hidden Markov models. Parameter estimation is carried out with an EM algorithm. The search for the optimal dictionary is particularly difficult because of the variable dimension of the model space. We define a minimum description length criteria to evaluate each dictionary and use a combination of greedy search and careful initialization to select a best dictionary for a given dataset. Application of the model to simulated data gives encouraging results. In a real dataset, we are able to reconstruct a parsimonious dictionary that captures patterns of linkage disequilibrium well.
View details for Web of Science ID 000237966000011
View details for PubMedID 16706724
-
Bayesian sparse hidden components analysis for transcription regulation networks
BIOINFORMATICS
2006; 22 (6): 739-746
Abstract
In systems like Escherichia Coli, the abundance of sequence information, gene expression array studies and small scale experiments allows one to reconstruct the regulatory network and to quantify the effects of transcription factors on gene expression. However, this goal can only be achieved if all information sources are used in concert.Our method integrates literature information, DNA sequences and expression arrays. A set of relevant transcription factors is defined on the basis of literature. Sequence data are used to identify potential target genes and the results are used to define a prior distribution on the topology of the regulatory network. A Bayesian hidden component model for the expression array data allows us to identify which of the potential binding sites are actually used by the regulatory proteins in the studied cell conditions, the strength of their control, and their activation profile in a series of experiments. We apply our methodology to 35 expression studies in E.Coli with convincing results.www.genetics.ucla.edu/labs/sabatti/software.htmlThe supplementary material are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btk017
View details for Web of Science ID 000236111600015
View details for PubMedID 16368767
-
A genome-wide linkage scan of familial benign recurrent vertigo: linkage to 22q12 with evidence of heterogeneity
HUMAN MOLECULAR GENETICS
2006; 15 (2): 251-258
Abstract
Benign recurrent vertigo (BRV) is a common disorder affecting up to 2% of the adult population and may be etiologically related to migraine because of similarities in the clinical spectrum of the phenotypes and a high co-morbidity within families. Many families have multiple-affected genetically related individuals suggesting familial transmission of the disorder with moderate to high penetrance. While clinically similar to episodic ataxias, there are currently no genes identified that contribute to BRV and no systematic linkage studies performed. In an initial effort to genetically define BRV, we have selected from our Neurology Clinic population a subset of 20 multigenerational families with apparent autosomal dominant transmission, and performed genetic linkage mapping using both parametric and non-parametric linkage (NPL) approaches. The Affymetrix 10K SNP Mapping Assay was used for the genotyping. Heterogeneity LOD (HLOD) analysis reveals the evidence of genetic heterogeneity for BRV and evidence of linkage in a subset of the families to 22q12 (HLOD = 4.02). An additional region was identified by NPL analysis at 5p15 (LOD = 2.63). As migraine is observed substantially more commonly both within the BRV-affected individuals and the related family members, it is possible that a form of migraine is allelic to the BRV locus at 22q12. However, testing linkage or the chromosome 22q12 region to a broader migraine/vertigo phenotype by defining affectation status as either migrainous headaches or BRV greatly weakened the linkage signal, and no significant other peaks were detected. Thus, BRV and migraine does not appear to be allelic disorders within these families. We conclude that BRV is a heterogeneous genetic disorder, appears genetically distinct from migraine with aura and is linked to 22q12. Additional family and population-based linkage and association studies will be needed to determine the causative alleles.
View details for DOI 10.1093/hmg/ddi441
View details for Web of Science ID 000234630400007
View details for PubMedID 16330481
-
Linkage disequilibrium and haplotype homozygosity in population samples genotyped at a high marker density
HUMAN HEREDITY
2006; 62 (4): 175-189
Abstract
Analyze the information contained in homozygous haplotypes detected with high density genotyping.We analyze the genotypes of approximately 2,500 markers on chr 22 in 12 population samples, each including 200 individuals. We develop a measure of disequilibrium based on haplotype homozygosity and an algorithm to identify genomic segments characterized by non-random homozygosity (NRH), taking into account allele frequencies, missing data, genotyping error, and linkage disequilibrium.We show how our measure of linkage disequilibrium based on homozygosity leads to results comparable to those of R(2), as well as the importance of correcting for small sample variation when evaluating D'. We observe that the regions that harbor NRH segments tend to be consistent across populations, are gene rich, and are characterized by lower recombination.It is crucial to take into account LD patterns when interpreting long stretches of homozygous markers.
View details for DOI 10.1159/000096599
View details for Web of Science ID 000242847200001
View details for PubMedID 17077642
-
Distribution and dynamics of Lamp1-containing endocytic organelles in fibroblasts deficient in BLOC-3
JOURNAL OF CELL SCIENCE
2005; 118 (22): 5243-5255
Abstract
Late endosomes and lysosomes of mammalian cells in interphase tend to concentrate in the perinuclear region that harbors the microtubule-organizing center. We have previously reported abnormal distribution of these organelles - as judged by reduced percentages of cells displaying pronounced perinuclear accumulation - in mutant fibroblasts lacking BLOC-3 (for ;biogenesis of lysosome-related organelles complex 3'). BLOC-3 is a protein complex that contains the products of the genes mutated in Hermansky-Pudlak syndrome types 1 and 4. Here, we developed a method based on image analysis to estimate the extent of organelle clustering in the perinuclear region of cultured cells. Using this method, we corroborated that the perinuclear clustering of late endocytic organelles containing Lamp1 (for ;lysosome-associated membrane protein 1') is reduced in BLOC-3-deficient murine fibroblasts, and found that it is apparently normal in fibroblasts deficient in BLOC-1 or BLOC-2, which are another two protein complexes associated with Hermansky-Pudlak syndrome. Wild-type and mutant fibroblasts were transfected to express human LAMP1 fused at its cytoplasmic tail to green fluorescence protein (GFP). At low expression levels, LAMP1-GFP was targeted correctly to late endocytic organelles in both wild-type and mutant cells. High levels of LAMP1-GFP overexpression elicited aberrant aggregation of late endocytic organelles, a phenomenon that probably involved formation of anti-parallel dimers of LAMP1-GFP as it was not observed in cells expressing comparable levels of a non-dimerizing mutant variant, LAMP1-mGFP. To test whether BLOC-3 plays a role in the movement of late endocytic organelles, time-lapse fluorescence microscopy experiments were performed using live cells expressing low levels of LAMP1-GFP or LAMP1-mGFP. Although active movement of late endocytic organelles was observed in both wild-type and mutant fibroblasts, quantitative analyses revealed a relatively lower frequency of microtubule-dependent movement events, either towards or away from the perinuclear region, within BLOC-3-deficient cells. By contrast, neither the duration nor the speed of these microtubule-dependent events seemed to be affected by the lack of BLOC-3 function. These results suggest that BLOC-3 function is required, directly or indirectly, for optimal attachment of late endocytic organelles to microtubule-dependent motors.
View details for DOI 10.1242/jcs.02633
View details for Web of Science ID 000233883500009
View details for PubMedID 16249233
-
A generalized framework for Network Component Analysis
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
2005; 2 (4): 289-301
Abstract
The authors recently introduced a framework, named Network Component Analysis (NCA), for the reconstruction of the dynamics of transcriptional regulators' activities from gene expression assays. The original formulation had certain shortcomings that limited NCA's application to a wide class of network dynamics reconstruction problems, either because of limitations in the sample size or because of the stringent requirements imposed by the set of identifiability conditions. In addition, the performance characteristics of the method for various levels of data noise or in the presence of model inaccuracies were never investigated. In this article, the following aspects of NCA have been addressed, resulting in a set of extensions to the original framework: 1) The sufficient conditions on the a priori connectivity information (required for successful reconstructions via NCA) are made less stringent, allowing easier verification of whether a network topology is identifiable, as well as extending the class of identifiable systems. Such a result is accomplished by introducing a set of identifiability requirements that can be directly tested on the regulatory architecture, rather than on specific instances of the system matrix. 2) The two-stage least square iterative procedure used in NCA is proven to identify stationary points of the likelihood function, under Gaussian noise assumption, thus reinforcing the statistical foundations of the method. 3) A framework for the simultaneous reconstruction of multiple regulatory subnetworks is introduced, thus overcoming one of the critical limitations of the original formulation of the decomposition, for example, occurring for poorly sampled data (typical of microarray experiments). A set of monte carlo simulations we conducted with synthetic data suggests that the approach is indeed capable of accurately reconstructing regulatory signals when these are the input of large-scale networks that satisfy the suggested identifiability criteria, even under fairly noisy conditions. The sensitivity of the reconstructed signals to inaccuracies in the hypothesized network topology is also investigated. We demonstrate the feasibility of our approach for the simultaneous reconstruction of multiple regulatory subnetworks from the same data set with a successful application of the technique to gene expression measurements of the bacterium Escherichia coli.
View details for Web of Science ID 000235704400002
View details for PubMedID 17044167
-
Guidelines for association studies in human molecular genetics
HUMAN MOLECULAR GENETICS
2005; 14 (17): 2481-2483
View details for DOI 10.1093/hmg/ddi251
View details for Web of Science ID 000231473300001
View details for PubMedID 16037069
-
Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites
BIOINFORMATICS
2005; 21 (7): 922-931
Abstract
Gene expression arrays enable measurements of transcription values for a large number or all genes in the genome. In order to better interpret these results and to use them to reconstruct transcription networks, information on location of binding sites for regulatory proteins in the entire genome is needed. In particular, this represents an open problem in Escherichia coli.We describe the first implementation of dictionary-style models to the study of transcription factors binding sites in an entire genome. Vocabulon's unique feature is that it can both reconstruct binding sites characterized by unknown motifs and impute locations of known binding sites in long sequences by simultaneous search. On one hand, the dictionary model specifies a probability for the entire sequence taking simultaneously into account all the possible binding sites. This greatly reduces the number of false positives. On the other hand, the possibility of refining motif description, as an increasing number of binding sites are identified, augments the sensitivity of the method. We illustrate these properties with examples in E.coli. The results of gene expression arrays are used both to guide the search and corroborate it.
View details for DOI 10.1093/bioinformatics/bti083
View details for Web of Science ID 000227977800012
View details for PubMedID 15509602
-
Empirical Bayes estimation of a sparse vector of gene expression changes
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
2005; 4
Abstract
Gene microarray technology is often used to compare the expression of thousand of genes in two different cell lines. Typically, one does not expect measurable changes in transcription amounts for a large number of genes; furthermore, the noise level of array experiments is rather high in relation to the available number of replicates. For the purpose of statistical analysis, inference on the "population'' difference in expression for genes across the two cell lines is often cast in the framework of hypothesis testing, with the null hypothesis being no change in expression. Given that thousands of genes are investigated at the same time, this requires some multiple comparison correction procedure to be in place. We argue that hypothesis testing, with its emphasis on type I error and family analogues, may not address the exploratory nature of most microarray experiments. We instead propose viewing the problem as one of estimation of a vector known to have a large number of zero components. In a Bayesian framework, we describe the prior knowledge on expression changes using mixture priors that incorporate a mass at zero, and we choose a loss function that favors the selection of sparse solutions. We consider two different models applicable to the microarray problem, depending on the nature of replicates available, and show how to explore the posterior distributions of the parameters using MCMC. Simulations show an interesting connection between this Bayesian estimation framework and false discovery rate (FDR) control. Finally, two empirical examples illustrate the practical advantages of this Bayesian estimation paradigm.
View details for Web of Science ID 000238478100016
View details for PubMedID 16646840
-
Inferring protein domain interactions from databases of interacting proteins
GENOME BIOLOGY
2005; 6 (10)
Abstract
We describe domain pair exclusion analysis (DPEA), a method for inferring domain interactions from databases of interacting proteins. DPEA features a log odds score, Eij, reflecting confidence that domains i and j interact. We analyzed 177,233 potential domain interactions underlying 26,032 protein interactions. In total, 3,005 high-confidence domain interactions were inferred, and were evaluated using known domain interactions in the Protein Data Bank. DPEA may prove useful in guiding experiment-based discovery of previously unrecognized domain interactions.
View details for DOI 10.1186/gb-2005-6-10-r89
View details for Web of Science ID 000232679600012
View details for PubMedID 16207360
-
Suggestive linkage to chromosome 6q in families with bilateral vestibulopathy
NEUROLOGY
2004; 63 (12): 2376-2379
Abstract
Of the more than 40 genetically defined dominantly inherited hearing loss syndromes, only a few are associated with bilateral vestibulopathy. No genetic mutations have been identified in families with bilateral vestibulopathy and normal hearing.To perform a genome-wide scan for linkage in four families with dominantly inherited bilateral vestibulopathy.Patients in four families reported brief episodes of vertigo followed by imbalance and oscillopsia. Bilateral vestibulopathy was documented with quantitative rotational testing. Most patients with bilateral vestibulopathy also had migraine. A 10 cM genome-wide screen was conducted using 423 microsatellite markers to identify linkage with vestibulopathy.The authors identified a 24 cM region on chromosome 6q suggestive of linkage to vestibulopathy in these four families (maximum lod score of 2.9 at marker D6S1556). A small fifth family with a different phenotype was not linked to this region on chromosome 6q.This is the first report of linkage in families with dominantly inherited vestibulopathy and normal hearing. Genetic heterogeneity is likely with inherited vestibulopathy.
View details for Web of Science ID 000226010000030
View details for PubMedID 15623703
-
A novel mutation in KCNA1 causes episodic ataxia without myokymia.
Human mutation
2004; 24 (6): 536-?
Abstract
We describe a unique family in which several individual are affected with episodes of ataxia that best fit the phenotype of episodic ataxia type 2 (EA2). All of the affected family members had episodes typically lasting for several hours, and none of them had muscle abnormalities including myokymia. Episodic ataxia type 1 (EA1) was not considered initially as a clinical diagnosis for the affected individuals in this family. However, by linkage mapping, sequencing and polymorphism analysis, all affecteds were found to have a novel mutation in KCNA1. Numerous missense mutations have been described previously in KCNA1 that cause EA1. The mutation c.1025G>T replaces a highly conserved serine with isoleucine at position 342 (p.Ser342Ile) in the highly conserved fifth transmembrane domain of the KCNA1. This mutation leads to a distinct clinical phenotype without myokymia broadening the scope of clinical characteristics of EA1 and highlighting the heterogeneity of phenotypic effects from distinct missense mutations.
View details for PubMedID 15532032
-
The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology
NATURE GENETICS
2004; 36 (10): 1045-1051
Abstract
Efforts to identify gene variants associated with susceptibility to common diseases use three approaches: pedigree and affected sib-pair linkage studies and association studies of population samples. The different aims of these study designs reflect their derivation from biological versus epidemiological traditions. Similar principles regarding determination of the evidence levels required to consider the results statistically significant apply to both linkage and association studies, however. Such determination requires explicit attention to the prior probability of particular findings, as well as appropriate correction for multiple comparisons. For most common diseases, increasing the sample size in a study is a crucial step in achieving statistically significant genetic mapping results. Recent studies suggest that the technology and statistical methodology will soon be available to make well-powered studies feasible using any of these approaches.
View details for DOI 10.1038/ng1433
View details for Web of Science ID 000224156500009
View details for PubMedID 15454942
-
Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2004; 101 (2): 641-646
Abstract
Cells adjust gene expression profiles in response to environmental and physiological changes through a series of signal transduction pathways. Upon activation or deactivation, the terminal regulators bind to or dissociate from DNA, respectively, and modulate transcriptional activities on particular promoters. Traditionally, individual reporter genes have been used to detect the activity of the transcription factors. This approach works well for simple, non-overlapping transcription pathways. For complex transcriptional networks, more sophisticated tools are required to deconvolute the contribution of each regulator. Here, we demonstrate the utility of network component analysis in determining multiple transcription factor activities based on transcriptome profiles and available connectivity information regarding network connectivity. We used Escherichia coli carbon source transition from glucose to acetate as a model system. Key results from this analysis were either consistent with physiology or verified by using independent measurements.
View details for DOI 10.1073/pnas.0305287101
View details for Web of Science ID 000188210400042
View details for PubMedID 14694202
-
A Bayesian approach to expression network component analysis
26th Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society
IEEE. 2004: 2933–2936
View details for Web of Science ID 000225461800759
-
A Bayesian approach to expression network component analysis.
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference
2004; 4: 2933-2936
Abstract
A semiblind deconvolution method of analysis for gene expression data was proposed recently in a series of articles appeared in PNAS. We illustrate here how similar goals can be achieved in a Bayesian framework and how necessary information on the presence of binding sites can be obtained with Vocabulon, an algorithm based on a stochastic dictionary model.
View details for PubMedID 17270892
-
Network component analysis: Reconstruction of regulatory signals in biological systems
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2003; 100 (26): 15522-15527
Abstract
High-dimensional data sets generated by high-throughput technologies, such as DNA microarray, are often the outputs of complex networked systems driven by hidden regulatory signals. Traditional statistical methods for computing low-dimensional or hidden representations of these data sets, such as principal component analysis and independent component analysis, ignore the underlying network structures and provide decompositions based purely on a priori statistical constraints on the computed component signals. The resulting decomposition thus provides a phenomenological model for the observed data and does not necessarily contain physically or biologically meaningful signals. Here, we develop a method, called network component analysis, for uncovering hidden regulatory signals from outputs of networked systems, when only a partial knowledge of the underlying network topology is available. The a priori network structure information is first tested for compliance with a set of identifiability criteria. For networks that satisfy the criteria, the signals from the regulatory nodes and their strengths of influence on each output node can be faithfully reconstructed. This method is first validated experimentally by using the absorbance spectra of a network of various hemoglobin species. The method is then applied to microarray data generated from yeast Saccharamyces cerevisiae and the activities of various transcription factors during cell cycle are reconstructed by using recently discovered connectivity information for the underlying transcriptional regulatory networks.
View details for DOI 10.1073/pnas.2136632100
View details for Web of Science ID 000187554600044
View details for PubMedID 14673099
-
Global analysis of gene expression in neural progenitors reveals specific cell-cycle, signaling, and metabolic networks
DEVELOPMENTAL BIOLOGY
2003; 261 (1): 165-182
Abstract
The genetic programs underlying neural stem cell (NSC) proliferation and pluripotentiality have only been partially elucidated. We compared the gene expression profile of proliferating neural stem cell cultures (NS) with cultures differentiated for 24 h (DC) to identify functionally coordinated alterations in gene expression associated with neural progenitor proliferation. The majority of differentially expressed genes (65%) were upregulated in NS relative to DC. Microarray analysis of this in vitro system was followed by high throughput screening in situ hybridization to identify genes enriched in the germinal neuroepithelium, so as to distinguish those expressed in neural progenitors from those expressed in more differentiated cells in vivo. NS cultures were characterized by the coordinate upregulation of genes involved in cell cycle progression, DNA synthesis, and metabolism, not simply related to general features of cell proliferation, since many of the genes identified were highly enriched in the CNS ventricular zones and not widely expressed in other proliferating tissues. Components of specific metabolic and signal transduction pathways, and several transcription factors, including Sox3, FoxM1, and PTTG1, were also enriched in neural progenitor cultures. We propose a putative network of gene expression linking cell cycle control to cell fate pathways, providing a framework for further investigations of neural stem cell proliferation and differentiation.
View details for DOI 10.1016/S0012-1606(03)00274-4
View details for Web of Science ID 000185224400011
View details for PubMedID 12941627
-
False discovery rate in linkage and association genome screens for complex disorders
GENETICS
2003; 164 (2): 829-833
Abstract
We explore the implications of the false discovery rate (FDR) controlling procedure in disease gene mapping. With the aid of simulations, we show how, under models commonly used, the simple step-down procedure introduced by Benjamini and Hochberg controls the FDR for the dependent tests on which linkage and association genome screens are based. This adaptive multiple comparison procedure may offer an important tool for mapping susceptibility genes for complex diseases.
View details for Web of Science ID 000183880000042
View details for PubMedID 12807801
-
The Human Phenome Project
NATURE GENETICS
2003; 34 (1): 15-21
View details for Web of Science ID 000182667900006
View details for PubMedID 12721547
-
Dictionary model for the analysis of E-Coli promoter regions
25th Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society
IEEE. 2003: 3711–3714
View details for Web of Science ID 000189395300971
-
Genomewide motif identification using a dictionary model
PROCEEDINGS OF THE IEEE
2002; 90 (11): 1803-1810
View details for DOI 10.1109/JPROC.2002.804689
View details for Web of Science ID 000179204700010
-
Microanalysis of DNA microarrays
ASM NEWS
2002; 68 (9): 432-437
View details for Web of Science ID 000178021500010
-
Familial horizontal gaze palsy with progressive scoliosis maps to chromosome 11q23-25
NEUROLOGY
2002; 59 (3): 432-435
Abstract
Horizontal gaze palsy with progressive scoliosis (HGPS) is a rare, autosomal recessive disorder characterized by a congenital absence of conjugate horizontal eye movement, with progressive scoliosis developing in childhood or adolescence. The authors identified two unrelated consanguineous families with HGPS. Genomewide homozygosity mapping and linkage analysis mapped the disease locus to a 30-cM interval on chromosome 11q23-25 (combined maximum multipoint lod score Z = 5.46).
View details for Web of Science ID 000177335800022
View details for PubMedID 12177379
-
Co-expression pattern from DNA microarray experiments as a tool for operon prediction
NUCLEIC ACIDS RESEARCH
2002; 30 (13): 2886-2893
Abstract
The prediction of operons, the smallest unit of transcription in prokaryotes, is the first step towards reconstruction of a regulatory network at the whole genome level. Sequence information, in particular the distance between open reading frames, has been used to predict if adjacent Escherichia coli genes are in an operon. While appreciably successful, these predictions need to be validated and refined experimentally. As a growing number of gene expression array experiments on E.coli became available, we investigated to what extent they could be used to improve and validate these predictions. To this end, we examined a large collection of published microarry data. The correlation between expression ratios of adjacent genes was used in a Bayesian classification scheme to predict whether the genes are in an operon or not. We found that for the genes whose expression levels change significantly across the experiments in the data set, the currently available gene expression data allowed a significant refinement of the sequenced-based predictions. We report these co-expression correlations in an E.coli genomic map. For a significant portion of gene pairs, however, the set of array experiments considered did not contain sufficient information to determine whether they are in the same transcriptional unit. This is not due to unreliability of the array data per se, but to the design of the experiments analyzed. In general, experiments that perturb a large number of genes offer more information for operon prediction than confined perturbations. These results provide a rationale for conducting expression studies comparing conditions that cause global changes in gene expression.
View details for Web of Science ID 000176607000021
View details for PubMedID 12087173
-
Homozygosity and linkage disequilibrium
GENETICS
2002; 160 (4): 1707-1719
Abstract
We illustrate how homozygosity of haplotypes can be used to measure the level of disequilibrium between two or more markers. An excess of either homozygosity or heterozygosity signals a departure from the gametic phase equilibrium: We describe the specific form of dependence that is associated with high (low) homozygosity and derive various linkage disequilibrium measures. They feature a clear biological interpretation, can be used to construct tests, and are standardized to allow comparison across loci and populations. They are particularly advantageous to measure linkage disequilibrium between highly polymorphic markers.
View details for Web of Science ID 000175237200039
View details for PubMedID 11973323
-
Thresholding rules for recovering a sparse signal from microarray experiments
MATHEMATICAL BIOSCIENCES
2002; 176 (1): 17-34
Abstract
We consider array experiments that compare expression levels of a high number of genes in two cell lines with few repetitions and with no subject effect. We develop a statistical model that illustrates under which assumptions thresholding is optimal in the analysis of such microarray data. The results of our model explain the success of the empirical rule of two-fold change. We illustrate a thresholding procedure that is adaptive to the noise level of the experiment, the amount of genes analyzed, and the amount of genes that truly change expression level. This procedure, in a world of perfect knowledge on noise distribution, would allow reconstruction of a sparse signal, minimizing the false discovery rate. Given the amount of information actually available, the thresholding rule described provides a reasonable estimator for the change in expression of any gene in two compared cell lines.
View details for Web of Science ID 000174539900003
View details for PubMedID 11867081
-
An evaluation of tyramide signal amplification and archived fixed and frozen tissue in microarray gene expression analysis
NUCLEIC ACIDS RESEARCH
2002; 30 (2)
Abstract
Archival formalin-fixed, paraffin-embedded and ethanol-fixed tissues represent a potentially invaluable resource for gene expression analysis, as they are the most widely available material for studies of human disease. Little data are available evaluating whether RNA obtained from fixed (archival) tissues could produce reliable and reproducible microarray expression data. Here we compare the use of RNA isolated from human archival tissues fixed in ethanol and formalin to frozen tissue in cDNA microarray experiments. Since an additional factor that can limit the utility of archival tissue is the often small quantities available, we also evaluate the use of the tyramide signal amplification method (TSA), which allows the use of small amounts of RNA. Detailed analysis indicates that TSA provides a consistent and reproducible signal amplification method for cDNA microarray analysis, across both arrays and the genes tested. Analysis of this method also highlights the importance of performing non-linear channel normalization and dye switching. Furthermore, archived, fixed specimens can perform well, but not surprisingly, produce more variable results than frozen tissues. Consistent results are more easily obtainable using ethanol-fixed tissues, whereas formalin-fixed tissue does not typically provide a useful substrate for cDNA synthesis and labeling.
View details for Web of Science ID 000173551200028
View details for PubMedID 11788730
-
Dissecting a population genome for targeted screening of disease mutations
HUMAN MOLECULAR GENETICS
2001; 10 (26): 2961-2972
Abstract
Compared to mixed populations, population isolates such as Finland show distinct differences in the prevalence of disease mutations. However, little information exists of the differences on the prevalence of different disease alleles in regional populations with different history of multiple bottlenecks. We constructed a DNA-array and monitored the prevalence of 31 rare and common disease mutations underlying 27 clinical phenotypes in a large population-based study sample. Over 64 000 genotypes were assigned in 2151 samples from four geographical areas representing early and late settlement regions of Finland. Each sample was analyzed in duplicate and a total of 142 000 array-derived genotyping calls were made. On average one in three individuals was found to be a carrier of one of the 31 monitored mutations. This should remove fears of the stigmatizing effect of a carrier-screening program monitoring multiple diseases. Regional differences were found in the prevalence of mutations, providing molecular evidence for the deviating population histories of regional subisolates. The mutations introduced early into the population revealed relatively even distribution in different subregions. More recently introduced rare mutations showed local clustering of disease alleles, indicating the persistence of population subisolates and the effect of multiple bottlenecks in molding the population gene pool. Regional differences were observed also for common disease alleles. Such precise information of the carrier frequencies could form the basis for targeted genetic screens in this population. Our approach describes a general paradigm for large-scale carrier-screening programs also in other populations.
View details for Web of Science ID 000172870300001
View details for PubMedID 11751678
-
Bayesian analysis of haplotypes for linkage disequilibrium mapping
GENOME RESEARCH
2001; 11 (10): 1716-1724
Abstract
Haplotype analysis of disease chromosomes can help identify probable historical recombination events and localize disease mutations. Most available analyses use only marginal and pairwise allele frequency information. We have developed a Bayesian framework that utilizes full haplotype information to overcome various complications such as multiple founders, unphased chromosomes, data contamination, and incomplete marker data. A stochastic model is used to describe the dependence structure among several variables characterizing the observed haplotypes, for example, the ancestral haplotypes and their ages, mutation rate, recombination events, and the location of the disease mutation. An efficient Markov chain Monte Carlo algorithm was developed for computing the estimates of the quantities of interest. The method is shown to perform well in both real data sets (cystic fibrosis data and Friedreich ataxia data) and simulated data sets. The program that implements the proposed method, BLADE, as well as the two real datasets, can be obtained from http://www.fas.harvard.edu/~junliu/TechRept/01folder/diseq_prog.tar.gz.
View details for Web of Science ID 000171456000013
View details for PubMedID 11591648
View details for PubMedCentralID PMC311130
-
Generalised Gibbs sampler and multigrid Monte Carlo for Bayesian computation
BIOMETRIKA
2000; 87 (2): 353-369
View details for Web of Science ID 000087815100009
-
The DYT1 phenotype and guidelines for diagnostic testing
NEUROLOGY
2000; 54 (9): 1746-1752
Abstract
To develop diagnostic testing guidelines for the DYT1 GAG deletion in the Ashkenazi Jewish (AJ) and non-Jewish (NJ) primary torsion dystonia (PTD) populations and to determine the range of dystonic features in affected DYT1 deletion carriers.The authors screened 267 individuals with PTD; 170 were clinically ascertained for diagnosis and treatment, 87 were affected family members ascertained for genetic studies, and 10 were clinically and genetically ascertained and included in both groups. We used published primers and PCR amplification across the critical DYT1 region to determine GAG deletion status. Features of dystonia in clinically ascertained (affected) DYT1 GAG deletion carriers and noncarriers were compared to determine a classification scheme that optimized prediction of carriers. The authors assessed the range of clinical features in the genetically ascertained (affected) DYT1 deletion carriers and tested for differences between AJ and NJ patients.The optimal algorithm for classification of clinically ascertained carriers was disease onset before age 24 years in a limb (misclassification, 16.5%; sensitivity, 95%; specificity, 80%). Although application of this classification scheme provided good separation in the AJ group (sensitivity, 96%; specificity, 88%), as well as in the group overall, it was less specific in discriminating NJ carriers from noncarriers (sensitivity, 94%; specificity, 69%). Using age 26 years as the cut-off and any site at onset gave a sensitivity of 100%, but specificity decreased to 54% (63% in AJ and 43% in NJ). Among genetically ascertained carriers, onset up to age 44 years occurred, although the great majority displayed early limb onset. There were no significant differences between AJ and NJ genetically ascertained carriers, except that a higher proportion of NJ carriers had onset in a leg, rather than an arm, and widespread disease.Diagnostic DYT1 testing in conjunction with genetic counseling is recommended for patients with PTD with onset before age 26 years, as this single criterion detected 100% of clinically ascertained carriers, with specificities of 43% to 63%. Testing patients with onset after age 26 years also may be warranted in those having an affected relative with early onset, as the only carriers we observed with onset at age 26 or later were genetically ascertained relatives of individuals whose symptoms started before age 26 years.
View details for Web of Science ID 000086908000007
View details for PubMedID 10802779
-
A genetic region on chromosome 16 may predispose to the development of Crohn's disease at an early age in Ashkenazi Jews.
CELL PRESS. 1999: A99–A99
View details for Web of Science ID 000082879800521
-
Simulated sintering: Markov chain Monte Carlo with spaces of varying dimensions
6th Valencia International Meeting on Bayesian Statistics
OXFORD UNIV PRESS. 1999: 389–413
View details for Web of Science ID 000169678800017