Hua Tang
Professor of Genetics and, by courtesy, of Statistics
Bio
Dr. Tang received her PhD in Statistics, with a minor in Genetics, from Stanford University in 2002. From 2002 to 2006, she was on faculty in the PHS division at the Fred Hutchinson Cancer Research Center. Dr. Tang joined the Stanford Genetics Department in 2007. The goals of her research are to better understand the evolutionary forces that have shaped the pattern of genetic variation in humans, as well as to elucidate the genetic architecture of complex traits and diseases in the context of human evolution.
Academic Appointments
-
Professor, Genetics
-
Professor (By courtesy), Statistics
-
Member, Bio-X
-
Member, Stanford Cancer Institute
-
Member, Wu Tsai Neurosciences Institute
Professional Education
-
AB, Harvard and Radcliffe College, Biology (1997)
-
PhD, Stanford University, Statistics (minor Genetics) (2002)
Current Research and Scholarly Interests
Research in our laboratory develops and applies statistical methods for analyzing patterns of human genetic variation, which underlie the phenotypic diversity of our species. We are collaborating on various genome-wide studies focusing on stratified or recently admixed populations. These studies offer unique opportunities to elucidate the evolutionary forces that have shaped the patterns of genetic variation in humans, to uncover the genetic basis of complex traits, and to shed light on the mechanisms that lead to diverse phenotypes and disparate disease risks among populations.
2024-25 Courses
-
Independent Studies (8)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum) - Directed Reading in Genetics
GENE 299 (Aut, Win, Spr, Sum) - Graduate Research
GENE 399 (Aut, Win, Spr, Sum) - Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum) - Medical Scholars Research
GENE 370 (Aut, Win, Spr, Sum) - Supervised Study
GENE 260 (Aut, Win, Spr, Sum) - Undergraduate Research
GENE 199 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
All Publications
-
Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies.
American journal of human genetics
2019
Abstract
Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.
View details for DOI 10.1016/j.ajhg.2019.08.012
View details for PubMedID 31564439
-
Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach
AMERICAN JOURNAL OF HUMAN GENETICS
2015; 96 (5): 740-752
Abstract
Elucidating the genetic basis of complex traits and diseases in non-European populations is particularly challenging because US minority populations have been under-represented in genetic association studies. We developed an empirical Bayes approach named XPEB (cross-population empirical Bayes), designed to improve the power for mapping complex-trait-associated loci in a minority population by exploiting information from genome-wide association studies (GWASs) from another ethnic population. Taking as input summary statistics from two GWASs-a target GWAS from an ethnic minority population of primary interest and an auxiliary base GWAS (such as a larger GWAS in Europeans)-our XPEB approach reprioritizes SNPs in the target population to compute local false-discovery rates. We demonstrated, through simulations, that whenever the base GWAS harbors relevant information, XPEB gains efficiency. Moreover, XPEB has the ability to discard irrelevant auxiliary information, providing a safeguard against inflated false-discovery rates due to genetic heterogeneity between populations. Applied to a blood-lipids study in African Americans, XPEB more than quadrupled the discoveries from the conventional approach, which used a target GWAS alone, bringing the number of significant loci from 14 to 65. Thus, XPEB offers a flexible framework for mapping complex traits in minority populations.
View details for DOI 10.1016/j.ajhg.2015.03.008
View details for Web of Science ID 000354189300005
View details for PubMedID 25892113
-
Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations.
American journal of human genetics
2013; 92 (6): 904-916
Abstract
Blood lipid concentrations are heritable risk factors associated with atherosclerosis and cardiovascular diseases. Lipid traits exhibit considerable variation among populations of distinct ancestral origin as well as between individuals within a population. We performed association analyses to identify genetic loci influencing lipid concentrations in African American and Hispanic American women in the Women's Health Initiative SNP Health Association Resource. We validated one African-specific high-density lipoprotein cholesterol locus at CD36 as well as 14 known lipid loci that have been previously implicated in studies of European populations. Moreover, we demonstrate striking similarities in genetic architecture (loci influencing the trait, direction and magnitude of genetic effects, and proportions of phenotypic variation explained) of lipid traits across populations. In particular, we found that a disproportionate fraction of lipid variation in African Americans and Hispanic Americans can be attributed to genomic loci exhibiting statistical evidence of association in Europeans, even though the precise genes and variants remain unknown. At the same time, we found substantial allelic heterogeneity within shared loci, characterized both by population-specific rare variants and variants shared among multiple populations that occur at disparate frequencies. The allelic heterogeneity emphasizes the importance of including diverse populations in future genetic association studies of complex traits such as lipids; furthermore, the overlap in lipid loci across populations of diverse ancestral origin argues that additional knowledge can be gleaned from multiple populations.
View details for DOI 10.1016/j.ajhg.2013.04.025
View details for PubMedID 23726366
View details for PubMedCentralID PMC3675231
-
Genetic Architecture of Skin and Eye Color in an African-European Admixed Population
PLOS GENETICS
2013; 9 (3)
Abstract
Variation in human skin and eye color is substantial and especially apparent in admixed populations, yet the underlying genetic architecture is poorly understood because most genome-wide studies are based on individuals of European ancestry. We study pigmentary variation in 699 individuals from Cape Verde, where extensive West African/European admixture has given rise to a broad range in trait values and genomic ancestry proportions. We develop and apply a new approach for measuring eye color, and identify two major loci (HERC2[OCA2] P = 2.3 × 10(-62), SLC24A5 P = 9.6 × 10(-9)) that account for both blue versus brown eye color and varying intensities of brown eye color. We identify four major loci (SLC24A5 P = 5.4 × 10(-27), TYR P = 1.1 × 10(-9), APBA2[OCA2] P = 1.5 × 10(-8), SLC45A2 P = 6 × 10(-9)) for skin color that together account for 35% of the total variance, but the genetic component with the largest effect (~44%) is average genomic ancestry. Our results suggest that adjacent cis-acting regulatory loci for OCA2 explain the relationship between skin and eye color, and point to an underlying genetic architecture in which several genes of moderate effect act together with many genes of small effect to explain ~70% of the estimated heritability.
View details for DOI 10.1371/journal.pgen.1003372
View details for Web of Science ID 000316866700048
View details for PubMedID 23555287
View details for PubMedCentralID PMC3605137
-
Ancestral Components of Admixed Genomes in a Mexican Cohort
PLOS GENETICS
2011; 7 (12)
Abstract
For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study "virtual genomes" of admixed individuals. We apply this approach to a cohort of 492 parent-offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations-Africa, Europe, and America-vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10-15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease-related phenotypes and will allow new insight into the adaptive and demographic history of indigenous people.
View details for DOI 10.1371/journal.pgen.1002410
View details for Web of Science ID 000299167900027
View details for PubMedID 22194699
View details for PubMedCentralID PMC3240599
-
Worldwide human relationships inferred from genome-wide patterns of variation
SCIENCE
2008; 319 (5866): 1100-1104
Abstract
Human genetic diversity is shaped by both demographic and biological factors and has fundamental implications for understanding the genetic basis of diseases. We studied 938 unrelated individuals from 51 populations of the Human Genome Diversity Panel at 650,000 common single-nucleotide polymorphism loci. Individual ancestry and population substructure were detectable with very high resolution. The relationship between haplotype heterozygosity and geography was consistent with the hypothesis of a serial founder effect with a single origin in sub-Saharan Africa. In addition, we observed a pattern of ancestral allele frequency distributions that reflects variation in population dynamics among geographic regions. This data set allows the most comprehensive characterization to date of human genetic variation.
View details for DOI 10.1126/science.1153717
View details for Web of Science ID 000253311700046
View details for PubMedID 18292342
-
Reconstructing genetic ancestry blocks in admixed individuals
AMERICAN JOURNAL OF HUMAN GENETICS
2006; 79 (1): 1-12
Abstract
A chromosome in an individual of recently admixed ancestry resembles a mosaic of chromosomal segments, or ancestry blocks, each derived from a particular ancestral population. We consider the problem of inferring ancestry along the chromosomes in an admixed individual and thereby delineating the ancestry blocks. Using a simple population model, we infer gene-flow history in each individual. Compared with existing methods, which are based on a hidden Markov model, the Markov-hidden Markov model (MHMM) we propose has the advantage of accounting for the background linkage disequilibrium (LD) that exists in ancestral populations. When there are more than two ancestral groups, we allow each ancestral population to admix at a different time in history. We use simulations to illustrate the accuracy of the inferred ancestry as well as the importance of modeling the background LD; not accounting for background LD between markers may mislead us to false inferences about mixed ancestry in an indigenous population. The MHMM makes it possible to identify genomic blocks of a particular ancestry by use of any high-density single-nucleotide-polymorphism panel. One application of our method is to perform admixture mapping without genotyping special ancestry-informative-marker panels.
View details for Web of Science ID 000238341200001
View details for PubMedID 16773560
-
Transcriptome variation in human tissues revealed by long-read sequencing.
Nature
2022
Abstract
Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.
View details for DOI 10.1038/s41586-022-05035-y
View details for PubMedID 35922509
-
Large-scale genome-wide association study of coronary artery disease in genetically diverse populations.
Nature medicine
2022
Abstract
We report a genome-wide association study (GWAS) of coronary artery disease (CAD) incorporating nearly a quarter of a million cases, in which existing studies are integrated with data from cohorts of white, Black and Hispanic individuals from the Million Veteran Program. We document near equivalent heritability of CAD across multiple ancestral groups, identify 95 novel loci, including nine on the X chromosome, detect eight loci of genome-wide significance in Black and Hispanic individuals, and demonstrate that two common haplotypes at the 9p21 locus are responsible for risk stratification in all populations except those of African origin, in which these haplotypes are virtually absent. Moreover, in the largest GWAS for angiographically derived coronary atherosclerosis performed to date, we find 15 loci of genome-wide significance that robustly overlap with established loci for clinical CAD. Phenome-wide association analyses of novel loci and polygenic risk scores (PRSs) augment signals related to insulin resistance, extend pleiotropic associations of these loci to include smoking and family history, and precisely document the markedly reduced transferability of existing PRSs to Black individuals. Downstream integrative analyses reinforce the critical roles of vascular endothelial, fibroblast, and smooth muscle cells in CAD susceptibility, but also point to a shared biology between atherosclerosis and oncogenesis. This study highlights the value of diverse populations in further characterizing the genetic architecture of CAD.
View details for DOI 10.1038/s41591-022-01891-3
View details for PubMedID 35915156
-
Sex-biased admixture and assortative mating shape genetic variation and influence demographic inference in admixed Cabo Verdeans.
G3 (Bethesda, Md.)
2022
Abstract
Genetic data can provide insights into population history, but first we must understand the patterns that complex histories leave in genomes. Here, we consider the admixed human population of Cabo Verde to understand the patterns of genetic variation left by social and demographic processes. First settled in the late 1400s, Cabo Verdeans are admixed descendants of Portuguese colonizers and enslaved West African people. We consider Cabo Verde's well-studied historical record alongside genome-wide SNP data from 563 individuals from 4 regions within the archipelago. We use genetic ancestry to test for patterns of nonrandom mating and sex-specific gene flow, and we examine the consequences of these processes for common demographic inference methods and for genetic patterns. Notably, multiple population genetic tools that assume random mating underestimate the timing of admixture, but incorporating non-random mating produces estimates more consistent with historical records. We consider how admixture interrupts common summaries of genomic variation such as runs-of-homozygosity (ROH). While summaries of ROH may be difficult to interpret in admixed populations, differentiating ROH by length class shows that ROH reflect historical differences between the islands in their contributions from the source populations and post-admixture population dynamics. Finally, we find higher African ancestry on the X chromosome than on the autosomes, consistent with an excess of European males and African females contributing to the gene pool. Considering these genomic insights into population history in the context of Cabo Verde's historical record, we can identify how assumptions in genetic models impact inference of population history more broadly.
View details for DOI 10.1093/g3journal/jkac183
View details for PubMedID 35861404
-
Robust Identification of Temporal Biomarkers in Longitudinal Omics Studies.
Bioinformatics (Oxford, England)
2022
Abstract
Longitudinal studies increasingly collect rich 'omics' data sampled frequently over time and across large cohorts to capture dynamic health fluctuations and disease transitions. However, the generation of longitudinal omics data has preceded the development of analysis tools that can efficiently extract insights from such data. In particular, there is a need for statistical frameworks that can identify not only which omics features are differentially regulated between groups but also over what time intervals. Additionally, longitudinal omics data may have inconsistencies, including nonuniform sampling intervals, missing data points, subject dropout, and differing numbers of samples per subject.In this work, we developed OmicsLonDA, a statistical method that provides robust identification of time intervals of temporal omics biomarkers. OmicsLonDA is based on a semi-parametric approach, in which we use smoothing splines to model longitudinal data and infer significant time intervals of omics features based on an empirical distribution constructed through a permutation procedure. We benchmarked OmicsLonDA on five simulated datasets with diverse temporal patterns, and the method showed specificity greater than 0.99 and sensitivity greater than 0.87. Applying OmicsLonDA to the iPOP cohort revealed temporal patterns of genes, proteins, hormone metabolites, and microbes that are differentially regulated in male versus female subjects following a respiratory infection. In addition, we applied OmicsLonDA to the longitudinal multi-omics dataset of pregnant women with and without preeclampsia, and the method identified potential lipid markers that are temporally significantly different between the two groups.We provide an open-source R package (https://bioconductor.org/packages/OmicsLonDA), to enable widespread use.Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btac403
View details for PubMedID 35762936
-
Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data.
Nature genetics
2022
Abstract
Analyses of data from genome-wide association studies on unrelated individuals have shown that, for human traits and diseases, approximately one-third to two-thirds of heritability is captured by common SNPs. However, it is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular whether the causal variants are rare, or whether it is overestimated due to bias in inference from pedigree data. Here we estimated heritability for height and body mass index (BMI) from whole-genome sequence data on 25,465 unrelated individuals of European ancestry. The estimated heritability was 0.68 (standard error 0.10) for height and 0.30 (standard error 0.10) for body mass index. Low minor allele frequency variants in low linkage disequilibrium (LD) with neighboring variants were enriched for heritability, to a greater extent for protein-altering variants, consistent with negative selection. Our results imply that rare variants, in particular those in regions of low linkage disequilibrium, are a major source of the still missing heritability of complex traits and disease.
View details for DOI 10.1038/s41588-021-00997-7
View details for PubMedID 35256806
-
Rare transmission of commensal and pathogenic bacteria in the gut microbiome of hospitalized adults.
Nature communications
1800; 13 (1): 586
Abstract
Bacterial bloodstream infections are a major cause of morbidity and mortality among patients undergoing hematopoietic cell transplantation (HCT). Although previous research has demonstrated that pathogens may translocate from the gut microbiome into the bloodstream to cause infections, the mechanisms by which HCT patients acquire pathogens in their microbiome have not yet been described. Here, we use linked-read and short-read metagenomic sequencing to analyze 401 stool samples collected from 149 adults undergoing HCT and hospitalized in the same unit over three years, many of whom were roommates. We use metagenomic assembly and strain-specific comparison methods to search for high-identity bacterial strains, which may indicate transmission between the gut microbiomes of patients. Overall, the microbiomes of patients who share time and space in the hospital do not converge in taxonomic composition. However, we do observe six pairs of patients who harbor identical or nearly identical strains of the pathogen Enterococcus faecium, or the gut commensals Akkermansia muciniphila andHungatella hathewayi. These shared strains may result from direct transmission between patients who shared a room and bathroom, acquisition from a common hospital source, or transmission from an unsampled intermediate. We also identify multiple patients with identical strains of species commonly found in commercial probiotics, including Lactobacillus rhamnosus and Streptococcus thermophilus. In summary, our findings indicate that sharing of identical pathogens between the gut microbiomes of multiple patients is a rare phenomenon. Furthermore, the observed potential transmission of commensal, immunomodulatory microbes suggests that exposure to other humans may contribute to microbiome reassembly post-HCT.
View details for DOI 10.1038/s41467-022-28048-7
View details for PubMedID 35102136
-
Whole genome sequence analysis of platelet traits in the NHLBI trans-omics for precision medicine initiative.
Human molecular genetics
2021
Abstract
Platelets play a key role in thrombosis and hemostasis. Platelet count (PLT) and mean platelet volume (MPV) are highly heritable quantitative traits, with hundreds of genetic signals previously identified, mostly in European ancestry populations. We here utilize whole genome sequencing from NHLBI's Trans-Omics for Precision Medicine Initiative (TOPMed) in a large multi-ethnic sample to further explore common and rare variation contributing to PLT (n=61200) and MPV (n=23485). We identified and replicated secondary signals at MPL (rs532784633) and PECAM1 (rs73345162), both more common in African ancestry populations. We also observed rare variation in Mendelian platelet related disorder genes influencing variation in platelet traits in TOPMed cohorts (not enriched for blood disorders). For example, association of GP9 with lower PLT and higher MPV was partly driven by a pathogenic Bernard-Soulier syndrome variant (rs5030764, p.Asn61Ser), and the signals at TUBB1 and CD36 were partly driven by loss of function variants not annotated as pathogenic in ClinVar (rs199948010 and rs571975065). However, residual signal remained for these gene-based signals after adjusting for lead variants, suggesting that additional variants in Mendelian genes with impacts in general population cohorts remain to be identified. Gene-based signals were also identified at several GWAS identified loci for genes not annotated for Mendelian platelet disorders (PTPRH, TET2, CHEK2), with somatic variation driving the result at TET2. These results highlight the value of whole genome sequencing in populations of diverse genetic ancestry to identify novel regulatory and coding signals, even for well-studied traits like platelet traits.
View details for DOI 10.1093/hmg/ddab252
View details for PubMedID 34553764
-
Advances and challenges in quantitative delineation of the genetic architecture of complex traits
QUANTITATIVE BIOLOGY
2021; 9 (2): 168-184
View details for DOI 10.15302/J-QB-021-0249
View details for Web of Science ID 000687996800007
-
Advances and challenges in quantitative delineation of the genetic architecture of complex traits.
Quantitative biology (Beijing, China)
2021; 9 (2): 168-184
Abstract
Genome-wide association studies (GWAS) have been widely adopted in studies of human complex traits and diseases.This review surveys areas of active research: quantifying and partitioning trait heritability, fine mapping functional variants and integrative analysis, genetic risk prediction of phenotypes, and the analysis of sequencing studies that have identified millions of rare variants. Current challenges and opportunities are highlighted.GWAS have fundamentally transformed the field of human complex trait genetics. Novel statistical and computational methods have expanded the scope of GWAS and have provided valuable insights on the genetic architecture underlying complex phenotypes.
View details for DOI 10.15302/j-qb-021-0249
View details for PubMedID 35492964
View details for PubMedCentralID PMC9053444
-
Identification of putative causal loci in whole-genome sequencing data via knockoff statistics.
Nature communications
2021; 12 (1): 3152
Abstract
The analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.
View details for DOI 10.1038/s41467-021-22889-4
View details for PubMedID 34035245
-
Genetic and non-genetic factors affecting the expression of COVID-19-relevant genes in the large airway epithelium.
Genome medicine
2021; 13 (1): 66
Abstract
BACKGROUND: The large airway epithelial barrier provides one of the first lines of defense against respiratory viruses, including SARS-CoV-2 that causes COVID-19. Substantial inter-individual variability in individual disease courses is hypothesized to be partially mediated by the differential regulation of the genes that interact with the SARS-CoV-2 virus or are involved in the subsequent host response. Here, we comprehensively investigated non-genetic and genetic factors influencing COVID-19-relevant bronchial epithelial gene expression.METHODS: We analyzed RNA-sequencing data from bronchial epithelial brushings obtained from uninfected individuals. We related ACE2 gene expression to host and environmental factors in the SPIROMICS cohort of smokers with and without chronic obstructive pulmonary disease (COPD) and replicated these associations in two asthma cohorts, SARP and MAST. To identify airway biology beyond ACE2 binding that may contribute to increased susceptibility, we used gene set enrichment analyses to determine if gene expression changes indicative of a suppressed airway immune response observed early in SARS-CoV-2 infection are also observed in association with host factors. To identify host genetic variants affecting COVID-19 susceptibility in SPIROMICS, we performed expression quantitative trait (eQTL) mapping and investigated the phenotypic associations of the eQTL variants.RESULTS: We found that ACE2 expression was higher in relation to active smoking, obesity, and hypertension that are known risk factors of COVID-19 severity, while an association with interferon-related inflammation was driven by the truncated, non-binding ACE2 isoform. We discovered that expression patterns of a suppressed airway immune response to early SARS-CoV-2 infection, compared to other viruses, are similar to patterns associated with obesity, hypertension, and cardiovascular disease, which may thus contribute to a COVID-19-susceptible airway environment. eQTL mapping identified regulatory variants for genes implicated in COVID-19, some of which had pheWAS evidence for their potential role in respiratory infections.CONCLUSIONS: These data provide evidence that clinically relevant variation in the expression of COVID-19-related genes is associated with host factors, environmental exposures, and likely host genetic variation.
View details for DOI 10.1186/s13073-021-00866-2
View details for PubMedID 33883027
-
Whole-genome sequencing association analysis of quantitative red blood cell phenotypes: The NHLBI TOPMed program.
American journal of human genetics
2021
Abstract
Whole-genome sequencing (WGS), a powerful tool for detecting novel coding and non-coding disease-causing variants, has largely been applied to clinical diagnosis of inherited disorders. Here we leveraged WGS data in up to 62,653 ethnically diverse participants from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and assessed statistical association of variants with seven red blood cell (RBC) quantitative traits. We discovered 14 single variant-RBC trait associations at 12 genomic loci, which have not been reported previously. Several of the RBC trait-variant associations (RPN1, ELL2, MIDN, HBB, HBA1, PIEZO1, and G6PD) were replicated in independent GWAS datasets imputed to the TOPMed reference panel. Most of these discovered variants are rare/low frequency, and several are observed disproportionately among non-European Ancestry (African, Hispanic/Latino, or East Asian) populations. We identified a 3bp indel p.Lys2169del (g.88717175_88717177TCT[4]) (common only in the Ashkenazi Jewish population) of PIEZO1, a gene responsible for the Mendelian red cell disorder hereditary xerocytosis (MIM: 194380), associated with higher mean corpuscular hemoglobin concentration (MCHC). In stepwise conditional analysis and in gene-based rare variant aggregated association analysis, we identified several of the variants in HBB, HBA1, TMPRSS6, and G6PD that represent the carrier state for known coding, promoter, or splice site loss-of-function variants that cause inherited RBC disorders. Finally, we applied base and nuclease editing to demonstrate that the sentinel variant rs112097551 (nearest gene RPN1) acts through a cis-regulatory element that exerts long-range control of the gene RUVBL1 which is essential for hematopoiesis. Together, these results demonstrate the utility of WGS in ethnically diverse population-based samples and gene editing for expanding knowledge of the genetic architecture of quantitative hematologic traits and suggest a continuum between complex trait and Mendelian red cell disorders.
View details for DOI 10.1016/j.ajhg.2021.04.003
View details for PubMedID 33887194
-
Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease.
Cell
2021
Abstract
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
View details for DOI 10.1016/j.cell.2021.03.050
View details for PubMedID 33864768
-
Functional and structural basis of extreme conservation in vertebrate 5' untranslated regions.
Nature genetics
2021
Abstract
The lack of knowledge about extreme conservation in genomes remains a major gap in our understanding of the evolution of gene regulation. Here, we reveal an unexpected role of extremely conserved 5' untranslated regions (UTRs) in noncanonical translational regulation that is linked to the emergence of essential developmental features in vertebrate species. Endogenous deletion of conserved elements within these 5' UTRs decreased gene expression, and extremely conserved 5' UTRs possess cis-regulatory elements that promote cell-type-specific regulation of translation. We further developed in-cell mutate-and-map (icM2), a new methodology that maps RNA structure inside cells. Using icM2, we determined that an extremely conserved 5' UTR encodes multiple alternative structures and that each single nucleotide within the conserved element maintains the balance of alternative structures important to control the dynamic range of protein expression. These results explain how extreme sequence conservation can lead to RNA-level biological functions encoded in the untranslated regions of vertebrate genomes.
View details for DOI 10.1038/s41588-021-00830-1
View details for PubMedID 33821006
-
Inherited causes of clonal haematopoiesis in 97,691 whole genomes (vol 586 , pg 763, 2020)
NATURE
2021; 591 (7851): E27
View details for DOI 10.1038/s41586-021-03280
View details for Web of Science ID 000632177100002
-
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Nature
2021; 590 (7845): 290–99
Abstract
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View details for DOI 10.1038/s41586-021-03205-y
View details for PubMedID 33568819
-
Chromosome Xq23 is associated with lower atherogenic lipid concentrations and favorable cardiometabolic indices.
Nature communications
2021; 12 (1): 2182
Abstract
Autosomal genetic analyses of blood lipids have yielded key insights for coronary heart disease (CHD). However, X chromosome genetic variation is understudied for blood lipids in large sample sizes. We now analyze genetic and blood lipid data in a high-coverage whole X chromosome sequencing study of 65,322 multi-ancestry participants and perform replication among 456,893 European participants. Common alleles on chromosome Xq23 are strongly associated with reduced total cholesterol, LDL cholesterol, and triglycerides (min P=8.5*10-72), with similar effects for males and females. Chromosome Xq23 lipid-lowering alleles are associated with reduced odds for CHD among 42,545 cases and 591,247 controls (P=1.7*10-4), and reduced odds for diabetes mellitus type 2 among 54,095 cases and 573,885 controls (P=1.4*10-5). Although we observe an association with increased BMI, waist-to-hip ratio adjusted for BMI is reduced, bioimpedance analyses indicate increased gluteofemoral fat, and abdominal MRI analyses indicate reduced visceral adiposity. Co-localization analyses strongly correlate increased CHRDL1 gene expression, particularly in adipose tissue, with reduced concentrations of blood lipids.
View details for DOI 10.1038/s41467-021-22339-1
View details for PubMedID 33846329
-
Whole-genome sequencing in diverse subjects identifies genetic correlates of leukocyte traits: The NHLBI TOPMed program.
American journal of human genetics
2021
Abstract
Many common and rare variants associated with hematologic traits have been discovered through imputation on large-scale reference panels. However, the majority of genome-wide association studies (GWASs) have been conducted in Europeans, and determining causal variants has proved challenging. We performed a GWAS of total leukocyte, neutrophil, lymphocyte, monocyte, eosinophil, and basophil counts generated from 109,563,748 variants in the autosomes and the X chromosome in the Trans-Omics for Precision Medicine (TOPMed) program, which included data from 61,802 individuals of diverse ancestry. We discovered and replicated 7 leukocyte trait associations, including (1) the association between a chromosome X, pseudo-autosomal region (PAR), noncoding variant located between cytokine receptor genes (CSF2RA and CLRF2) and lower eosinophil count; and (2) associations between single variants found predominantly among African Americans at the S1PR3 (9q22.1) and HBB (11p15.4) loci and monocyte and lymphocyte counts, respectively. We further provide evidence indicating that the newly discovered eosinophil-lowering chromosome X PAR variant might be associated with reduced susceptibility to common allergic diseases such as atopic dermatitis and asthma. Additionally, we found a burden of very rare FLT3 (13q12.2) variants associated with monocyte counts. Together, these results emphasize the utility of whole-genome sequencing in diverse samples in identifying associations missed by European-ancestry-driven GWASs.
View details for DOI 10.1016/j.ajhg.2021.08.007
View details for PubMedID 34582791
-
Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics.
American journal of human genetics
2021
Abstract
Knockoff-based methods have become increasingly popular due to their enhanced power for locus discovery and their ability to prioritize putative causal variants in a genome-wide analysis. However, because of the substantial computational cost for generating knockoffs, existing knockoff approaches cannot analyze millions of rare genetic variants in biobank-scale whole-genome sequencing and whole-genome imputed datasets. We propose a scalable knockoff-based method for the analysis of common and rare variants across the genome, KnockoffScreen-AL, that is applicable to biobank-scale studies with hundreds of thousands of samples and millions of genetic variants. The application of KnockoffScreen-AL to the analysis of Alzheimer disease (AD) in 388,051 WG-imputed samples from the UK Biobank resulted in 31 significant loci, including 14 loci that are missed by conventional association tests on these data. We perform replication studies in an independent meta-analysis of clinically diagnosed AD with 94,437 samples, and additionally leverage single-cell RNA-sequencing data with 143,793 single-nucleus transcriptomes from 17 control subjects and AD-affected individuals, and proteomics data from 735 control subjects and affected indviduals with AD and related disorders to validate the genes at these significant loci. These multi-omics analyses show that 79.1% of the proximal genes at these loci and 76.2% of the genes at loci identified only by KnockoffScreen-AL exhibit at least suggestive signal (p < 0.05) in the scRNA-seq or proteomics analyses. We highlight a potentially causal gene in AD progression, EGFR, that shows significant differences in expression and protein levels between AD-affected individuals and healthy control subjects.
View details for DOI 10.1016/j.ajhg.2021.10.009
View details for PubMedID 34767756
-
Inherited causes of clonal haematopoiesis in 97,691 whole genomes.
Nature
2020
Abstract
Age is the dominant risk factor for most chronic human diseases, but the mechanisms through which ageing confers this risk are largely unknown1. The age-related acquisition of somatic mutations that lead to clonal expansion in regenerating haematopoietic stem cell populations has recently been associated with both haematological cancer2-4 and coronary heart disease5-this phenomenon istermed clonal haematopoiesis of indeterminate potential (CHIP)6. Simultaneous analyses of germline and somatic whole-genome sequences provide the opportunity to identify root causes of CHIP. Here we analyse high-coverage whole-genome sequences from 97,691 participants of diverse ancestries in the National Heart, Lung, and Blood Institute Trans-omics for Precision Medicine (TOPMed) programme, and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid and inflammatory traits that are specific to different CHIPdriver genes. Association of a genome-wide set of germline genetic variants enabled the identification of three genetic loci associated with CHIP status, including one locus at TET2 that was specific to individuals of African ancestry. In silico-informed in vitro evaluation of the TET2 germline locus enabled the identification of a causal variant that disrupts a TET2 distal enhancer, resulting in increased self-renewal of haematopoietic stem cells. Overall, we observe that germline genetic variation shapes haematopoietic stem cell function, leading to CHIP through mechanisms that are specific to clonal haematopoiesis as well as shared mechanisms that lead to somatic mutations across tissues.
View details for DOI 10.1038/s41586-020-2819-2
View details for PubMedID 33057201
-
The GTEx Consortium atlas of genetic regulatory effects across human tissues
SCIENCE
2020; 369 (6509): 1318-+
View details for DOI 10.1126/science.aaz1776
View details for Web of Science ID 000569840300041
-
Trans-ethnic and Ancestry-Specific Blood-Cell Genetics in 746,667 Individuals from 5 Global Populations.
Cell
2020; 182 (5): 1198
Abstract
Most loci identified by GWASs have been found in populations of European ancestry (EUR). In trans-ethnic meta-analyses for 15 hematological traits in 746,667 participants, including 184,535 non-EUR individuals, we identified 5,552 trait-variant associations at p< 5* 10-9, including 71 novel associations not found in EUR populations. We also identified 28 additional novel variants in ancestry-specific, non-EUR meta-analyses, including an IL7 missense variant in South Asians associated with lymphocyte count invivo and IL-7 secretion levels invitro. Fine-mapping prioritized variants annotated as functional and generated 95% credible sets that were 30% smaller when using the trans-ethnic as opposed to the EUR-only results. We explored the clinical significance and predictive value of trans-ethnic variants in multiple populations and compared genetic architecture and the effect of natural selection on these blood phenotypes between populations. Altogether, our results for hematological traits highlight the value of a more global representation of populations in genetic studies.
View details for DOI 10.1016/j.cell.2020.06.045
View details for PubMedID 32888493
-
The Polygenic and Monogenic Basis of Blood Traits and Diseases.
Cell
2020; 182 (5): 1214
Abstract
Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.
View details for DOI 10.1016/j.cell.2020.08.008
View details for PubMedID 32888494
-
Genome-Wide Gene-Diabetes and Gene-Obesity Interaction Scan in 8,255 Cases and 11,900 Controls from PanScan and PanC4 Consortia.
Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology
2020; 29 (9): 1784–91
Abstract
BACKGROUND: Obesity and diabetes are major modifiable risk factors for pancreatic cancer. Interactions between genetic variants and diabetes/obesity have not previously been comprehensively investigated in pancreatic cancer at the genome-wide level.METHODS: We conducted a gene-environment interaction (GxE) analysis including 8,255 cases and 11,900 controls from four pancreatic cancer genome-wide association study (GWAS) datasets (Pancreatic Cancer Cohort Consortium I-III and Pancreatic Cancer Case Control Consortium). Obesity (body mass index ≥30 kg/m2) and diabetes (duration ≥3 years) were the environmental variables of interest. Approximately 870,000 SNPs (minor allele frequency ≥0.005, genotyped in at least one dataset) were analyzed. Case-control (CC), case-only (CO), and joint-effect test methods were used for SNP-level GxE analysis. As a complementary approach, gene-based GxE analysis was also performed. Age, sex, study site, and principal components accounting for population substructure were included as covariates. Meta-analysis was applied to combine individual GWAS summary statistics.RESULTS: No genome-wide significant interactions (departures from a log-additive odds model) with diabetes or obesity were detected at the SNP level by the CC or CO approaches. The joint-effect test detected numerous genome-wide significant GxE signals in the GWAS main effects top hit regions, but the significance diminished after adjusting for the GWAS top hits. In the gene-based analysis, a significant interaction of diabetes with variants in the FAM63A (family with sequence similarity 63 member A) gene (significance threshold P < 1.25 * 10-6) was observed in the meta-analysis (P GxE = 1.2 *10-6, P Joint = 4.2 *10-7).CONCLUSIONS: This analysis did not find significant GxE interactions at the SNP level but found one significant interaction with diabetes at the gene level. A larger sample size might unveil additional genetic factors via GxE scans.IMPACT: This study may contribute to discovering the mechanism of diabetes-associated pancreatic cancer.
View details for DOI 10.1158/1055-9965.EPI-20-0275
View details for PubMedID 32546605
-
Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale.
Nature genetics
2020
Abstract
Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
View details for DOI 10.1038/s41588-020-0676-4
View details for PubMedID 32839606
-
Detecting fitness epistasis in recently admixed populations with genome-wide data.
BMC genomics
2020; 21 (1): 476
Abstract
Fitness epistasis, the interaction effect of genes at different loci on fitness, makes an important contribution to adaptive evolution. Although fitness interaction evidence has been observed in model organisms, it is more difficult to detect and remains poorly understood in human populations as a result of limited statistical power and experimental constraints. Fitness epistasis is inferred from non-independence between unlinked loci. We previously observed ancestral block correlation between chromosomes 4 and 6 in African Americans. The same approach fails when examining ancestral blocks on the same chromosome due to the strong confounding effect observed in a recently admixed population.We developed a novel approach to eliminate the bias caused by admixture linkage disequilibrium when searching for fitness epistasis on the same chromosome. We applied this approach in 16,252 unrelated African Americans and identified significant ancestral correlations in two pairs of genomic regions (P-value< 8.11 × 10- 7) on chromosomes 1 and 10. The ancestral correlations were not explained by population admixture. Historical African-European crossover events are reduced between pairs of epistatic regions. We observed multiple pairs of co-expressed genes shared by the two regions on each chromosome, including ADAR being co-expressed with IFI44 in almost all tissues and DARC being co-expressed with VCAM1, S1PR1 and ELTD1 in multiple tissues in the Genotype-Tissue Expression (GTEx) data. Moreover, the co-expressed gene pairs are associated with the same diseases/traits in the GWAS Catalog, such as white blood cell count, blood pressure, lung function, inflammatory bowel disease and educational attainment.Our analyses revealed two instances of fitness epistasis on chromosomes 1 and 10, and the findings suggest a potential approach to improving our understanding of adaptive evolution.
View details for DOI 10.1186/s12864-020-06874-7
View details for PubMedID 32652930
-
A Quantitative Proteome Map of the Human Body.
Cell
2020
Abstract
Determining protein levels in each tissue and how they compare with RNA levels is important for understanding human biology and disease as well as regulatory processes that control protein levels. We quantified the relative protein levels from over 12,000 genes across 32 normal human tissues. Tissue-specific or tissue-enriched proteins were identified and compared to transcriptome data. Many ubiquitous transcripts are found to encode tissue-specific proteins. Discordance of RNA and protein enrichment revealed potential sites of synthesis and action of secreted proteins. The tissue-specific distribution of proteins also provides an in-depth view of complex biological events that require the interplay of multiple tissues. Most importantly, our study demonstrated that protein tissue-enrichment information can explain phenotypes of genetic diseases, which cannot be obtained by transcript information alone. Overall, our results demonstrate how understanding protein levels can provide insights into regulation, secretome, metabolism, and human diseases.
View details for DOI 10.1016/j.cell.2020.08.036
View details for PubMedID 32916130
-
RobNorm: Model-Based Robust Normalization Method for Labeled Quantitative Mass Spectrometry Proteomics Data.
Bioinformatics (Oxford, England)
2020
Abstract
Data normalization is an important step in processing proteomics data generated in mass spectrometry (MS) experiments, which aims to reduce sample-level variation and facilitate comparisons of samples. Previously published methods for normalization primarily depend on the assumption that the distribution of protein expression is similar across all samples. However, this assumption fails when the protein expression data is generated from heterogenous samples, such as from various tissue types. This led us to develop a novel data-driven method for improved normalization to correct the systematic bias meanwhile maintaining underlying biological heterogeneity.To robustly correct the systematic bias, we used the density-power-weight method to down-weigh outliers and extended the one-dimensional robust fitting method described in the previous work of (Windham, 1995, Fujisawa and Eguchi, 2008) to our structured data. We then constructed a robustness criterion and developed a new normalization algorithm, called RobNorm.In simulation studies and analysis of real data from the genotype-tissue expression (GTEx) project, we compared and evaluated the performance of RobNorm against other normalization methods. We found that the RobNorm approach exhibits the greatest reduction in systematic bias while maintaining across-tissue variation, especially for datasets from highly heterogeneous samples.https://github.com/mwgrassgreen/RobNorm.
View details for DOI 10.1093/bioinformatics/btaa904
View details for PubMedID 33098413
-
Whole Genome Sequencing Identifies CRISPLD2 as a Lung Function Gene in Children With Asthma
CHEST
2019; 156 (6): 1068–79
Abstract
Asthma is a common respiratory disorder with a highly heterogeneous nature that remains poorly understood. The objective was to use whole genome sequencing (WGS) data to identify regions of common genetic variation contributing to lung function in individuals with a diagnosis of asthma.WGS data were generated for 1,053 individuals from trios and extended pedigrees participating in the family-based Genetic Epidemiology of Asthma in Costa Rica study. Asthma affection status was defined through a physician's diagnosis of asthma, and most participants with asthma also had airway hyperresponsiveness (AHR) to methacholine. Family-based association tests for single variants were performed to assess the associations with lung function phenotypes.A genome-wide significant association was identified between baseline FEV1/FVC ratio and a single-nucleotide polymorphism in the top hit cysteine-rich secretory protein LCCL domain-containing 2 (CRISPLD2) (rs12051168; P = 3.6 × 10-8 in the unadjusted model) that retained suggestive significance in the covariate-adjusted model (P = 5.6 × 10-6). Rs12051168 was also nominally associated with other related phenotypes: baseline FEV1 (P = 3.3 × 10-3), postbronchodilator (PB) FEV1 (7.3 × 10-3), and PB FEV1/FVC ratio (P = 2.7 × 10-3). The identified baseline FEV1/FVC ratio and rs12051168 association was meta-analyzed and replicated in three independent cohorts in which most participants with asthma also had confirmed AHR (combined weighted z-score P = .015) but not in cohorts without information about AHR.These findings suggest that using specific asthma characteristics, such as AHR, can help identify more genetically homogeneous asthma subgroups with genotype-phenotype associations that may not be observed in all children with asthma. CRISPLD2 also may be important for baseline lung function in individuals with asthma who also may have AHR.
View details for DOI 10.1016/j.chest.2019.08.2202
View details for Web of Science ID 000500923700016
View details for PubMedID 31557467
-
A multi-ancestry genome-wide study incorporating gene-smoking interactions identifies multiple new loci for pulse pressure and mean arterial pressure.
Human molecular genetics
2019
Abstract
Elevated blood pressure (BP), a leading cause of global morbidity and mortality, is influenced by both genetic and lifestyle factors. Cigarette smoking is one such lifestyle factor. Across five ancestries, we performed a genome-wide gene-smoking interaction study of mean arterial pressure (MAP) and pulse pressure (PP) in 129913 individuals in stage 1 and follow-up analysis in 480178 additional individuals in stage 2. We report here 136 loci significantly associated with MAP and/or PP. Of these, 61 were previously published through main-effect analysis of BP traits, 37 were recently reported by us for systolic BP and/or diastolic BP through gene-smoking interaction analysis and 38 were newly identified (P<5*10-8, false discovery rate<0.05). We also identified nine new signals near known loci. Of the 136 loci, 8 showed significant interaction with smoking status. They include CSMD1 previously reported for insulin resistance and BP in the spontaneously hypertensive rats. Many of the 38 new loci show biologic plausibility for a role in BP regulation. SLC26A7 encodes a chloride/bicarbonate exchanger expressed in the renal outer medullary collecting duct. AVPR1A is widely expressed, including in vascular smooth muscle cells, kidney, myocardium and brain. FHAD1 is a long non-coding RNA overexpressed in heart failure. TMEM51 was associated with contractile function in cardiomyocytes. CASP9 plays a central role in cardiomyocyte apoptosis. Identified only in African ancestry were 30 novel loci. Our findings highlight the value of multi-ancestry investigations, particularly in studies of interaction with lifestyle factors, where genomic and lifestyle differences may contribute to novel findings.
View details for DOI 10.1093/hmg/ddz070
View details for PubMedID 31127295
-
Multi-ancestry genome-wide gene-smoking interaction study of 387,272 individuals identifies new loci associated with serum lipids.
Nature genetics
2019; 51 (4): 636-648
Abstract
The concentrations of high- and low-density-lipoprotein cholesterol and triglycerides are influenced by smoking, but it is unknown whether genetic associations with lipids may be modified by smoking. We conducted a multi-ancestry genome-wide gene-smoking interaction study in 133,805 individuals with follow-up in an additional 253,467 individuals. Combined meta-analyses identified 13 new loci associated with lipids, some of which were detected only because association differed by smoking status. Additionally, we demonstrate the importance of including diverse populations, particularly in studies of interactions with lifestyle factors, where genomic and lifestyle differences by ancestry may contribute to novel findings.
View details for DOI 10.1038/s41588-019-0378-y
View details for PubMedID 30926973
View details for PubMedCentralID PMC6467258
-
Multi-ancestry genome-wide gene-smoking interaction study of 387,272 individuals identifies new loci associated with serum lipids
NATURE GENETICS
2019; 51 (4): 636-+
View details for DOI 10.1038/s41588-019-0378-y
View details for Web of Science ID 000462767500011
-
Multi-ancestry study of blood lipid levels identifies four loci interacting with physical activity
NATURE COMMUNICATIONS
2019; 10
View details for DOI 10.1038/s41467-018-08008-w
View details for Web of Science ID 000456286400004
-
Multi-ancestry study of blood lipid levels identifies four loci interacting with physical activity.
Nature communications
2019; 10 (1): 376
Abstract
Many genetic loci affect circulating lipid levels, but it remains unknown whether lifestyle factors, such as physical activity, modify these genetic effects. To identify lipid loci interacting with physical activity, we performed genome-wide analyses of circulating HDL cholesterol, LDL cholesterol, and triglyceride levels in up to 120,979 individuals of European, African, Asian, Hispanic, and Brazilian ancestry, with follow-up of suggestive associations in an additional 131,012 individuals. We find four loci, in/near CLASP1, LHX1, SNTA1, and CNTNAP2, that are associated with circulating lipid levels through interaction with physical activity; higher levels of physical activity enhance the HDL cholesterol-increasing effects of the CLASP1, LHX1, and SNTA1 loci and attenuate the LDL cholesterol-increasing effect of the CNTNAP2 locus. The CLASP1, LHX1, and SNTA1 regions harbor genes linked to muscle function and lipid metabolism. Our results elucidate the role of physical activity interactions in the genetic contribution to blood lipid levels.
View details for PubMedID 30670697
-
Evaluating the strength of genetic results: Risks and responsibilities.
PLoS genetics
2019; 15 (10): e1008437
View details for DOI 10.1371/journal.pgen.1008437
View details for PubMedID 31603891
-
Association of APOL1 Risk Alleles with Cardiovascular Disease in African Americans in the Million Veteran Program.
Circulation
2019
Abstract
Approximately 13% of African-American individuals carry two copies of the APOL1 risk alleles G1 or G2, which are associated with 1.5-2.5 fold increased risk of chronic kidney disease (CKD). There have been conflicting reports as to whether an association exists between APOL1 risk alleles and cardiovascular disease, independent of the effects of APOL1 on kidney disease. We sought to test the association of APOL1 G1/G2 alleles with coronary artery disease (CAD), peripheral artery disease (PAD), and stroke among African American individuals in the Million Veteran Program (MVP).We performed a time-to-event analysis of retrospective electronic health record (EHR) data using Cox proportional hazard and competing risks Fine and Gray sub-distribution hazard models. The primary exposure was APOL1 risk allele status. The primary outcome was incident CAD amongst individuals without CKD during the 12.5 year follow up period. Separately we analyzed the cross-sectional association of APOL1 risk allele status with lipid traits and 115 cardiovascular diseases using phenome-wide association.Among 30,903 African American MVP participants, 3,941 (13%) carried the two APOL1 risk allele high-risk genotype. Individuals with normal kidney function at baseline with two risk alleles had slightly higher risk of developing CAD compared to those with no risk alleles (Hazard Ratio (HR): 1.11, 95% Confidence Interval (CI): 1.01-1.21, p=0.039). Similarly, modest associations were identified with incident stroke (HR: 1.20, 95% CI: 1.05-1.36, p=0.007) and PAD (HR: 1.15, 95% CI:1.01-1.29, p=0.031). When modeling both cardiovascular and renal outcomes, APOL1 was strongly associated with incident renal disease, while no significant association with the cardiovascular disease endpoints could be detected. Cardiovascular phenome-wide association analyses did not identify additional significant associations with cardiovascular disease subsets.APOL1 risk variants display a modest association with cardiovascular disease and this association is likely mediated by the known APOL1 association with CKD.
View details for DOI 10.1161/CIRCULATIONAHA.118.036589
View details for PubMedID 31337231
-
Doubling down on forensic twin studies.
PLoS genetics
2018; 14 (12): e1007831
View details for PubMedID 30571773
-
Genetics of blood lipids among similar to 300,000 multi-ethnic participants of the Million Veteran Program
NATURE GENETICS
2018; 50 (11): 1514-+
View details for DOI 10.1038/s41588-018-0222-9
View details for Web of Science ID 000448398000007
-
Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program.
Nature genetics
2018
Abstract
The Million Veteran Program (MVP) was established in 2011 as a national research initiative to determine how genetic variation influences the health of US military veterans. Here we genotyped 312,571 MVP participants using a custom biobank array and linked the genetic data to laboratory and clinical phenotypes extracted from electronic health records covering a median of 10.0 years of follow-up. Among 297,626 veterans with at least one blood lipid measurement, including 57,332 black and 24,743 Hispanic participants, we tested up to around 32 million variants for association with lipid levels and identified 118 novel genome-wide significant loci after meta-analysis with data from the Global Lipids Genetics Consortium (total n>600,000). Through a focus on mutations predicted to result in a loss of gene function and a phenome-wide association study, we propose novel indications for pharmaceutical inhibitors targeting PCSK9 (abdominal aortic aneurysm), ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease).
View details for PubMedID 30275531
-
Novel genetic associations for blood pressure identified via gene-alcohol interaction in up to 570K individuals across multiple ancestries
PLOS ONE
2018; 13 (6): e0198166
Abstract
Heavy alcohol consumption is an established risk factor for hypertension; the mechanism by which alcohol consumption impact blood pressure (BP) regulation remains unknown. We hypothesized that a genome-wide association study accounting for gene-alcohol consumption interaction for BP might identify additional BP loci and contribute to the understanding of alcohol-related BP regulation. We conducted a large two-stage investigation incorporating joint testing of main genetic effects and single nucleotide variant (SNV)-alcohol consumption interactions. In Stage 1, genome-wide discovery meta-analyses in ≈131K individuals across several ancestry groups yielded 3,514 SNVs (245 loci) with suggestive evidence of association (P < 1.0 x 10-5). In Stage 2, these SNVs were tested for independent external replication in ≈440K individuals across multiple ancestries. We identified and replicated (at Bonferroni correction threshold) five novel BP loci (380 SNVs in 21 genes) and 49 previously reported BP loci (2,159 SNVs in 109 genes) in European ancestry, and in multi-ancestry meta-analyses (P < 5.0 x 10-8). For African ancestry samples, we detected 18 potentially novel BP loci (P < 5.0 x 10-8) in Stage 1 that warrant further replication. Additionally, correlated meta-analysis identified eight novel BP loci (11 genes). Several genes in these loci (e.g., PINX1, GATA4, BLK, FTO and GABBR2) have been previously reported to be associated with alcohol consumption. These findings provide insights into the role of alcohol consumption in the genetic architecture of hypertension.
View details for PubMedID 29912962
-
A Large-Scale Multi-ancestry Genome-wide Study Accounting for Smoking Behavior Identifies Multiple Significant Loci for Blood Pressure
AMERICAN JOURNAL OF HUMAN GENETICS
2018; 102 (3): 375–400
Abstract
Genome-wide association analysis advanced understanding of blood pressure (BP), a major risk factor for vascular conditions such as coronary heart disease and stroke. Accounting for smoking behavior may help identify BP loci and extend our knowledge of its genetic architecture. We performed genome-wide association meta-analyses of systolic and diastolic BP incorporating gene-smoking interactions in 610,091 individuals. Stage 1 analysis examined ∼18.8 million SNPs and small insertion/deletion variants in 129,913 individuals from four ancestries (European, African, Asian, and Hispanic) with follow-up analysis of promising variants in 480,178 additional individuals from five ancestries. We identified 15 loci that were genome-wide significant (p < 5 × 10-8) in stage 1 and formally replicated in stage 2. A combined stage 1 and 2 meta-analysis identified 66 additional genome-wide significant loci (13, 35, and 18 loci in European, African, and trans-ancestry, respectively). A total of 56 known BP loci were also identified by our results (p < 5 × 10-8). Of the newly identified loci, ten showed significant interaction with smoking status, but none of them were replicated in stage 2. Several loci were identified in African ancestry, highlighting the importance of genetic studies in diverse populations. The identified loci show strong evidence for regulatory features and support shared pathophysiology with cardiometabolic and addiction traits. They also highlight a role in BP regulation for biological candidates such as modulators of vascular structure and function (CDKN1B, BCAR1-CFDP1, PXDN, EEA1), ciliopathies (SDCCAG8, RPGRIP1L), telomere maintenance (TNKS, PINX1, AKTIP), and central dopaminergic signaling (MSRA, EBF2).
View details for PubMedID 29455858
-
Exome-wide association study of plasma lipids in > 300,000 individuals
NATURE GENETICS
2017; 49 (12): 1758-+
Abstract
We screened variants on an exome-focused genotyping array in >300,000 participants (replication in >280,000 participants) and identified 444 independent variants in 250 loci significantly associated with total cholesterol (TC), high-density-lipoprotein cholesterol (HDL-C), low-density-lipoprotein cholesterol (LDL-C), and/or triglycerides (TG). At two loci (JAK2 and A1CF), experimental analysis in mice showed lipid changes consistent with the human data. We also found that: (i) beta-thalassemia trait carriers displayed lower TC and were protected from coronary artery disease (CAD); (ii) excluding the CETP locus, there was not a predictable relationship between plasma HDL-C and risk for age-related macular degeneration; (iii) only some mechanisms of lowering LDL-C appeared to increase risk for type 2 diabetes (T2D); and (iv) TG-lowering alleles involved in hepatic production of TG-rich lipoproteins (TM6SF2 and PNPLA3) tracked with higher liver fat, higher risk for T2D, and lower risk for CAD, whereas TG-lowering alleles involved in peripheral lipolysis (LPL and ANGPTL4) had no effect on liver fat but decreased risks for both T2D and CAD.
View details for PubMedID 29083408
View details for PubMedCentralID PMC5709146
-
Skin color variation in Africa.
Science (New York, N.Y.)
2017; 358 (6365): 867-868
View details for DOI 10.1126/science.aaq1322
View details for PubMedID 29146796
-
Trans-ancestry Fine Mapping and Molecular Assays Identify Regulatory Variants at the ANGPTL8 HDL-C GWAS Locus.
G3 (Bethesda, Md.)
2017; 7 (9): 3217-3227
Abstract
Recent genome-wide association studies (GWAS) have identified variants associated with high-density lipoprotein cholesterol (HDL-C) located in or near the ANGPTL8 gene. Given the extensive sharing of GWAS loci across populations, we hypothesized that at least one shared variant at this locus affects HDL-C. The HDL-C-associated variants are coincident with expression quantitative trait loci for ANGPTL8 and DOCK6 in subcutaneous adipose tissue; however, only ANGPTL8 expression levels are associated with HDL-C levels. We identified a 400-bp promoter region of ANGPTL8 and enhancer regions within 5 kb that contribute to regulating expression in liver and adipose. To identify variants functionally responsible for the HDL-C association, we performed fine-mapping analyses and selected 13 candidate variants that overlap putative regulatory regions to test for allelic differences in regulatory function. Of these variants, rs12463177-G increased transcriptional activity (1.5-fold, P = 0.004) and showed differential protein binding. Six additional variants (rs17699089, rs200788077, rs56322906, rs3760782, rs737337, and rs3745683) showed evidence of allelic differences in transcriptional activity and/or protein binding. Taken together, these data suggest a regulatory mechanism at the ANGPTL8 HDL-C GWAS locus involving tissue-selective expression and at least one functional variant.
View details for DOI 10.1534/g3.117.300088
View details for PubMedID 28754724
View details for PubMedCentralID PMC5592946
-
Genome-Wide Association Study of Blood Pressure Traits by Hispanic/Latino Background: the Hispanic Community Health Study/Study of Latinos
SCIENTIFIC REPORTS
2017; 7: 10348
Abstract
Hypertension prevalence varies between ethnic groups, possibly due to differences in genetic, environmental, and cultural determinants. Hispanic/Latino Americans are a diverse and understudied population. We performed a genome-wide association study (GWAS) of blood pressure (BP) traits in 12,278 participants from the Hispanics Community Health Study/Study of Latinos (HCHS/SOL). In the discovery phase we identified eight previously unreported BP loci. In the replication stage, we tested these loci in the 1982 Pelotas Birth Cohort Study of admixed Southern Brazilians, the COGENT-BP study of African descent, women of European descent from the Women Health Initiative (WHI), and a sample of European descent from the UK Biobank. No loci met the Bonferroni-adjusted level of statistical significance (0.0024). Two loci had marginal evidence of replication: rs78701042 (NGF) with diastolic BP (P = 0.008 in the 1982 Pelotas Birth Cohort Study), and rs7315692 (SLC5A8) with systolic BP (P = 0.007 in European ancestry replication). We investigated whether previously reported loci associated with BP in studies of European, African, and Asian ancestry generalize to Hispanics/Latinos. Overall, 26% of the known associations in studies of individuals of European and Chinese ancestries generalized, while only a single association previously discovered in a people of African descent generalized.
View details for DOI 10.1038/s41598-017-09019-1
View details for Web of Science ID 000408997700047
View details for PubMedID 28871152
View details for PubMedCentralID PMC5583292
-
Joint genotype- and ancestry-based genome-wide association studies in admixed populations
GENETIC EPIDEMIOLOGY
2017; 41 (6): 555–66
Abstract
In genome-wide association studies (GWAS) genetic loci that influence complex traits are localized by inspecting associations between genotypes of genetic markers and the values of the trait of interest. On the other hand, admixture mapping, which is performed in case of populations consisting of a recent mix of two ancestral groups, relies on the ancestry information at each locus (locus-specific ancestry). Recently it has been proposed to jointly model genotype and locus-specific ancestry within the framework of single marker tests. Here, we extend this approach for population-based GWAS in the direction of multimarker models. A modified version of the Bayesian information criterion is developed for building a multilocus model that accounts for the differential correlation structure due to linkage disequilibrium (LD) and admixture LD. Simulation studies and a real data example illustrate the advantages of this new approach compared to single-marker analysis or modern model selection strategies based on separately analyzing genotype and ancestry data, as well as to single-marker analysis combining genotypic and ancestry information. Depending on the signal strength, our procedure automatically chooses whether genotypic or locus-specific ancestry markers are added to the model. This results in a good compromise between the power to detect causal mutations and the precision of their localization. The proposed method has been implemented in R and is available at http://www.math.uni.wroc.pl/~mbogdan/admixtures/.
View details for PubMedID 28657151
-
Inference on the Genetic Basis of Eye and Skin Color in an Admixed Population via Bayesian Linear Mixed Models.
Genetics
2017; 206 (2): 1113-1126
Abstract
Genetic association studies in admixed populations are underrepresented in the genomics literature, with a key concern for researchers being the adequate control of spurious associations due to population structure. Linear mixed models (LMMs) are well suited for genome-wide association studies (GWAS) because they account for both population stratification and cryptic relatedness and achieve increased statistical power by jointly modeling all genotyped markers. Additionally, Bayesian LMMs allow for more flexible assumptions about the underlying distribution of genetic effects, and can concurrently estimate the proportion of phenotypic variance explained by genetic markers. Using three recently published Bayesian LMMs, Bayes R, BSLMM, and BOLT-LMM, we investigate an existing data set on eye (n = 625) and skin (n = 684) color from Cape Verde, an island nation off West Africa that is home to individuals with a broad range of phenotypic values for eye and skin color due to the mix of West African and European ancestry. We use simulations to demonstrate the utility of Bayesian LMMs for mapping loci and studying the genetic architecture of quantitative traits in admixed populations. The Bayesian LMMs provide evidence for two new pigmentation loci: one for eye color (AHRR) and one for skin color (DDB1).
View details for DOI 10.1534/genetics.116.193383
View details for PubMedID 28381588
-
A Poisson Log-Normal Model for Constructing Gene Covariation Network Using RNA-seq Data.
Journal of computational biology : a journal of computational molecular cell biology
2017
Abstract
Constructing expression networks using transcriptomic data is an effective approach for studying gene regulation. A popular approach for constructing such a network is based on the Gaussian graphical model (GGM), in which an edge between a pair of genes indicates that the expression levels of these two genes are conditionally dependent, given the expression levels of all other genes. However, GGMs are not appropriate for non-Gaussian data, such as those generated in RNA-seq experiments. We propose a novel statistical framework that maximizes a penalized likelihood, in which the observed count data follow a Poisson log-normal distribution. To overcome the computational challenges, we use Laplace's method to approximate the likelihood and its gradients, and apply the alternating directions method of multipliers to find the penalized maximum likelihood estimates. The proposed method is evaluated and compared with GGMs using both simulated and real RNA-seq data. The proposed method shows improved performance in detecting edges that represent covarying pairs of genes, particularly for edges connecting low-abundant genes and edges around regulatory hubs.
View details for DOI 10.1089/cmb.2017.0053
View details for PubMedID 28557607
-
Single-trait and multi-trait genome-wide association analyses identify novel loci for blood pressure in African-ancestry populations.
PLoS genetics
2017; 13 (5)
Abstract
Hypertension is a leading cause of global disease, mortality, and disability. While individuals of African descent suffer a disproportionate burden of hypertension and its complications, they have been underrepresented in genetic studies. To identify novel susceptibility loci for blood pressure and hypertension in people of African ancestry, we performed both single and multiple-trait genome-wide association analyses. We analyzed 21 genome-wide association studies comprised of 31,968 individuals of African ancestry, and validated our results with additional 54,395 individuals from multi-ethnic studies. These analyses identified nine loci with eleven independent variants which reached genome-wide significance (P < 1.25×10-8) for either systolic and diastolic blood pressure, hypertension, or for combined traits. Single-trait analyses identified two loci (TARID/TCF21 and LLPH/TMBIM4) and multiple-trait analyses identified one novel locus (FRMD3) for blood pressure. At these three loci, as well as at GRP20/CDH17, associated variants had alleles common only in African-ancestry populations. Functional annotation showed enrichment for genes expressed in immune and kidney cells, as well as in heart and vascular cells/tissues. Experiments driven by these findings and using angiotensin-II induced hypertension in mice showed altered kidney mRNA expression of six genes, suggesting their potential role in hypertension. Our study provides new evidence for genes related to hypertension susceptibility, and the need to study African-ancestry populations in order to identify biologic factors contributing to hypertension.
View details for DOI 10.1371/journal.pgen.1006728
View details for PubMedID 28498854
-
Genome-wide survey in African Americans demonstrates potential epistasis of fitness in the human genome.
Genetic epidemiology
2017; 41 (2): 122-135
Abstract
The role played by epistasis between alleles at unlinked loci in shaping population fitness has been debated for many years and the existing evidence has been mainly accumulated from model organisms. In model organisms, fitness epistasis can be systematically inferred by detecting nonindependence of genotypic values between loci in a population and confirmed through examining the number of offspring produced in two-locus genotype groups. No systematic study has been conducted to detect epistasis of fitness in humans owing to experimental constraints. In this study, we developed a novel method to detect fitness epistasis by testing the correlation between local ancestries on different chromosomes in an admixed population. We inferred local ancestry across the genome in 16,252 unrelated African Americans and systematically examined the pairwise correlations between the genomic regions on different chromosomes. Our analysis revealed a pair of genomic regions on chromosomes 4 and 6 that show significant local ancestry correlation (P-value = 4.01 × 10(-8) ) that can be potentially attributed to fitness epistasis. However, we also observed substantial local ancestry correlation that cannot be explained by systemic ancestry inference bias. To our knowledge, this study is the first to systematically examine evidence of fitness epistasis across the human genome.
View details for DOI 10.1002/gepi.22026
View details for PubMedID 27917522
-
Genome-wide Trans-ethnic Meta-analysis Identifies Seven Genetic Loci Influencing Erythrocyte Traits and a Role for RBPMS in Erythropoiesis
AMERICAN JOURNAL OF HUMAN GENETICS
2017; 100 (1): 51–63
Abstract
Genome-wide association studies (GWASs) have identified loci for erythrocyte traits in primarily European ancestry populations. We conducted GWAS meta-analyses of six erythrocyte traits in 71,638 individuals from European, East Asian, and African ancestries using a Bayesian approach to account for heterogeneity in allelic effects and variation in the structure of linkage disequilibrium between ethnicities. We identified seven loci for erythrocyte traits including a locus (RBPMS/GTF2E2) associated with mean corpuscular hemoglobin and mean corpuscular volume. Statistical fine-mapping at this locus pointed to RBPMS at this locus and excluded nearby GTF2E2. Using zebrafish morpholino to evaluate loss of function, we observed a strong in vivo erythropoietic effect for RBPMS but not for GTF2E2, supporting the statistical fine-mapping at this locus and demonstrating that RBPMS is a regulator of erythropoiesis. Our findings show the utility of trans-ethnic GWASs for discovery and characterization of genetic loci influencing hematologic traits.
View details for PubMedID 28017375
View details for PubMedCentralID PMC5223059
-
Leveraging Multi-ethnic Evidence for Risk Assessment of Quantitative Traits in Minority Populations.
American journal of human genetics
2017; 101 (2): 218–26
Abstract
An essential component of precision medicine is the ability to predict an individual's risk of disease based on genetic and non-genetic factors. For complex traits and diseases, assessing the risk due to genetic factors is challenging because it requires knowledge of both the identity of variants that influence the trait and their corresponding allelic effects. Although the set of risk variants and their allelic effects may vary between populations, a large proportion of these variants were identified based on studies in populations of European descent. Heterogeneity in genetic architecture underlying complex traits and diseases, while broadly acknowledged, remains poorly characterized. Ignoring such heterogeneity likely reduces predictive accuracy for minority individuals. In this study, we propose an approach, called XP-BLUP, which ameliorates this ethnic disparity by combining trans-ethnic and ethnic-specific information. We build a polygenic model for complex traits that distinguishes candidate trait-relevant variants from the rest of the genome. The set of candidate variants are selected based on studies in any human population, yet the allelic effects are evaluated in a population-specific fashion. Simulation studies and real data analyses demonstrate that XP-BLUP adaptively utilizes trans-ethnic information and can substantially improve predictive accuracy in minority populations. At the same time, our study highlights the importance of the continued expansion of minority cohorts.
View details for PubMedID 28757202
View details for PubMedCentralID PMC5544393
-
Landscape of X chromosome inactivation across human tissues.
Nature
2017; 550 (7675): 244–48
Abstract
X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of 'escape' from inactivation varying between genes and individuals. The extent to which XCI is shared between cells and tissues remains poorly characterized, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression and phenotypic traits. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI.
View details for PubMedID 29022598
-
The impact of rare variation on gene expression across tissues.
Nature
2017; 550 (7675): 239–43
Abstract
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
View details for PubMedID 29022581
-
Genetic effects on gene expression across human tissues.
Nature
2017; 550 (7675): 204–13
Abstract
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
View details for PubMedID 29022597
-
Dynamic landscape and regulation of RNA editing in mammals.
Nature
2017; 550 (7675): 249–54
Abstract
Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules. Although many editing sites have recently been discovered, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing.
View details for PubMedID 29022589
-
Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease.
Nature genetics
2017; 49 (12): 1664–70
Abstract
Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.
View details for DOI 10.1038/ng.3969
View details for PubMedID 29019975
-
Gene by Environment Investigation of Incident Lung Cancer Risk in African-Americans.
EBioMedicine
2016; 4: 153-161
Abstract
Genome-wide association studies have identified polymorphisms linked to both smoking exposure and risk of lung cancer. The degree to which lung cancer risk is driven by increased smoking, genetics, or gene-environment interactions is not well understood.We analyzed associations between 28 single nucleotide polymorphisms (SNPs) previously associated with smoking quantity and lung cancer in 7156 African-American females in the Women's Health Initiative (WHI), then analyzed main effects of top nominally significant SNPs and interactions between SNPs, cigarettes per day (CPD) and pack-years for lung cancer in an independent, multi-center case-control study of African-American females and males (1078 lung cancer cases and 822 controls).Nine nominally significant SNPs for CPD in WHI were associated with incident lung cancer (corrected p-values from 0.027 to 6.09 × 10(- 5)). CPD was found to be a nominally significant effect modifier between SNP and lung cancer for six SNPs, including CHRNA5 rs2036527[A](betaSNP*CPD = - 0.017, p = 0.0061, corrected p = 0.054), which was associated with CPD in a previous genome-wide meta-analysis of African-Americans.These results suggest that chromosome 15q25.1 variants are robustly associated with CPD and lung cancer in African-Americans and that the allelic dose effect of these polymorphisms on lung cancer risk is most pronounced in lighter smokers.
View details for DOI 10.1016/j.ebiom.2016.01.002
View details for PubMedID 26981579
-
Meta-analysis of lipid-traits in Hispanics identifies novel loci, population-specific effects, and tissue-specific enrichment of eQTLs
SCIENTIFIC REPORTS
2016; 6
Abstract
We performed genome-wide meta-analysis of lipid traits on three samples of Mexican and Mexican American ancestry comprising 4,383 individuals, and followed up significant and highly suggestive associations in three additional Hispanic samples comprising 7,876 individuals. Genome-wide significant signals were observed in or near CELSR2, ZNF259/APOA5, KANK2/DOCK6 and NCAN/MAU2 for total cholesterol, LPL, ABCA1, ZNF259/APOA5, LIPC and CETP for HDL cholesterol, CELSR2, APOB and NCAN/MAU2 for LDL cholesterol, and GCKR, TRIB1, ZNF259/APOA5 and NCAN/MAU2 for triglycerides. Linkage disequilibrium and conditional analyses indicate that signals observed at ABCA1 and LIPC for HDL cholesterol and NCAN/MAU2 for triglycerides are independent of previously reported lead SNP associations. Analyses of lead SNPs from the European Global Lipids Genetics Consortium (GLGC) dataset in our Hispanic samples show remarkable concordance of direction of effects as well as strong correlation in effect sizes. A meta-analysis of the European GLGC and our Hispanic datasets identified five novel regions reaching genome-wide significance: two for total cholesterol (FN1 and SAMM50), two for HDL cholesterol (LOC100996634 and COPB1) and one for LDL cholesterol (LINC00324/CTC1/PFAS). The top meta-analysis signals were found to be enriched for SNPs associated with gene expression in a tissue-specific fashion, suggesting an enrichment of tissue-specific function in lipid-associated loci.
View details for DOI 10.1038/srep19429
View details for Web of Science ID 000368335800001
View details for PubMedCentralID PMC4726092
-
Meta-analysis of lipid-traits in Hispanics identifies novel loci, population-specific effects, and tissue-specific enrichment of eQTLs.
Scientific reports
2016; 6: 19429
Abstract
We performed genome-wide meta-analysis of lipid traits on three samples of Mexican and Mexican American ancestry comprising 4,383 individuals, and followed up significant and highly suggestive associations in three additional Hispanic samples comprising 7,876 individuals. Genome-wide significant signals were observed in or near CELSR2, ZNF259/APOA5, KANK2/DOCK6 and NCAN/MAU2 for total cholesterol, LPL, ABCA1, ZNF259/APOA5, LIPC and CETP for HDL cholesterol, CELSR2, APOB and NCAN/MAU2 for LDL cholesterol, and GCKR, TRIB1, ZNF259/APOA5 and NCAN/MAU2 for triglycerides. Linkage disequilibrium and conditional analyses indicate that signals observed at ABCA1 and LIPC for HDL cholesterol and NCAN/MAU2 for triglycerides are independent of previously reported lead SNP associations. Analyses of lead SNPs from the European Global Lipids Genetics Consortium (GLGC) dataset in our Hispanic samples show remarkable concordance of direction of effects as well as strong correlation in effect sizes. A meta-analysis of the European GLGC and our Hispanic datasets identified five novel regions reaching genome-wide significance: two for total cholesterol (FN1 and SAMM50), two for HDL cholesterol (LOC100996634 and COPB1) and one for LDL cholesterol (LINC00324/CTC1/PFAS). The top meta-analysis signals were found to be enriched for SNPs associated with gene expression in a tissue-specific fashion, suggesting an enrichment of tissue-specific function in lipid-associated loci.
View details for DOI 10.1038/srep19429
View details for PubMedID 26780889
View details for PubMedCentralID PMC4726092
-
PREMIX: PRivacy-preserving EstiMation of Individual admiXture.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2016; 2016: 1747-1755
Abstract
In this paper we proposed a framework: PRivacy-preserving EstiMation of Individual admiXture (PREMIX) using Intel software guard extensions (SGX). SGX is a suite of software and hardware architectures to enable efficient and secure computation over confidential data. PREMIX enables multiple sites to securely collaborate on estimating individual admixture within a secure enclave inside Intel SGX. We implemented a feature selection module to identify most discriminative Single Nucleotide Polymorphism (SNP) based on informativeness and an Expectation Maximization (EM)-based Maximum Likelihood estimator to identify the individual admixture. Experimental results based on both simulation and 1000 genome data demonstrated the efficiency and accuracy of the proposed framework. PREMIX ensures a high level of security as all operations on sensitive genomic data are conducted within a secure enclave using SGX.
View details for PubMedID 28269933
-
PLOS Genetics Data Sharing Policy: In Pursuit of Functional Utility
PLOS GENETICS
2015; 11 (12): e1005716
View details for PubMedID 26655768
-
Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans
GENOME RESEARCH
2015; 25 (11): 1610-1621
Abstract
Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy--many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation.
View details for DOI 10.1101/gr.193342.115
View details for Web of Science ID 000364355600003
View details for PubMedID 26297486
View details for PubMedCentralID PMC4617958
-
Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort
GENETICS
2015; 200 (4): 1285-?
Abstract
Using genome-wide genotypes, we characterized the genetic structure of 103,006 participants in the Kaiser Permanente Northern California multi-ethnic Genetic Epidemiology Research on Adult Health and Aging Cohort and analyzed the relationship to self-reported race/ethnicity. Participants endorsed any of 23 race/ethnicity/nationality categories, which were collapsed into seven major race/ethnicity groups. By self-report the cohort is 80.8% white and 19.2% minority; 93.8% endorsed a single race/ethnicity group, while 6.2% endorsed two or more. Principal component (PC) and admixture analyses were generally consistent with prior studies. Approximately 17% of subjects had genetic ancestry from more than one continent, and 12% were genetically admixed, considering only nonadjacent geographical origins. Self-reported whites were spread on a continuum along the first two PCs, indicating extensive mixing among European nationalities. Self-identified East Asian nationalities correlated with genetic clustering, consistent with extensive endogamy. Individuals of mixed East Asian-European genetic ancestry were easily identified; we also observed a modest amount of European genetic ancestry in individuals self-identified as Filipinos. Self-reported African Americans and Latinos showed extensive European and African genetic ancestry, and Native American genetic ancestry for the latter. Among 3741 genetically identified parent-child pairs, 93% were concordant for self-reported race/ethnicity; among 2018 genetically identified full-sib pairs, 96% were concordant; the lower rate for parent-child pairs was largely due to intermarriage. The parent-child pairs revealed a trend toward increasing exogamy over time; the presence in the cohort of individuals endorsing multiple race/ethnicity categories creates interesting challenges and future opportunities for genetic epidemiologic studies.
View details for DOI 10.1534/genetics.115.178616
View details for Web of Science ID 000359917000024
View details for PubMedID 26092716
View details for PubMedCentralID PMC4574246
-
Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.
Genetics
2015; 200 (4): 1285-1295
Abstract
Using genome-wide genotypes, we characterized the genetic structure of 103,006 participants in the Kaiser Permanente Northern California multi-ethnic Genetic Epidemiology Research on Adult Health and Aging Cohort and analyzed the relationship to self-reported race/ethnicity. Participants endorsed any of 23 race/ethnicity/nationality categories, which were collapsed into seven major race/ethnicity groups. By self-report the cohort is 80.8% white and 19.2% minority; 93.8% endorsed a single race/ethnicity group, while 6.2% endorsed two or more. Principal component (PC) and admixture analyses were generally consistent with prior studies. Approximately 17% of subjects had genetic ancestry from more than one continent, and 12% were genetically admixed, considering only nonadjacent geographical origins. Self-reported whites were spread on a continuum along the first two PCs, indicating extensive mixing among European nationalities. Self-identified East Asian nationalities correlated with genetic clustering, consistent with extensive endogamy. Individuals of mixed East Asian-European genetic ancestry were easily identified; we also observed a modest amount of European genetic ancestry in individuals self-identified as Filipinos. Self-reported African Americans and Latinos showed extensive European and African genetic ancestry, and Native American genetic ancestry for the latter. Among 3741 genetically identified parent-child pairs, 93% were concordant for self-reported race/ethnicity; among 2018 genetically identified full-sib pairs, 96% were concordant; the lower rate for parent-child pairs was largely due to intermarriage. The parent-child pairs revealed a trend toward increasing exogamy over time; the presence in the cohort of individuals endorsing multiple race/ethnicity categories creates interesting challenges and future opportunities for genetic epidemiologic studies.
View details for DOI 10.1534/genetics.115.178616
View details for PubMedID 26092716
View details for PubMedCentralID PMC4574246
-
Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction.
Nature
2015; 518 (7537): 102-106
Abstract
Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
View details for DOI 10.1038/nature13917
View details for PubMedID 25487149
-
Variants for HDL-C, LDL-C, and Triglycerides Identified from Admixture Mapping and Fine-Mapping Analysis in African American Families
CIRCULATION-CARDIOVASCULAR GENETICS
2015; 8 (1): 106-U206
Abstract
Admixture mapping of lipids was followed-up by family-based association analysis to identify variants for cardiovascular disease in African Americans.The present study conducted admixture mapping analysis for total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and triglycerides. The analysis was performed in 1905 unrelated African American subjects from the National Heart, Lung and Blood Institute's Family Blood Pressure Program (FBPP). Regions showing admixture evidence were followed-up with family-based association analysis in 3556 African American subjects from the FBPP. The admixture mapping and family-based association analyses were adjusted for age, age(2), sex, body mass index, and genome-wide mean ancestry to minimize the confounding caused by population stratification. Regions that were suggestive of local ancestry association evidence were found on chromosomes 7 (low-density lipoprotein cholesterol), 8 (high-density lipoprotein cholesterol), 14 (triglycerides), and 19 (total cholesterol and triglycerides). In the fine-mapping analysis, 52 939 single-nucleotide polymorphisms (SNPs) were tested and 11 SNPs (8 independent SNPs) showed nominal significant association with high-density lipoprotein cholesterol (2 SNPs), low-density lipoprotein cholesterol (4 SNPs), and triglycerides (5 SNPs). The family data were used in the fine-mapping to identify SNPs that showed novel associations with lipids and regions, including genes with known associations for cardiovascular disease.This study identified regions on chromosomes 7, 8, 14, and 19 and 11 SNPs from the fine-mapping analysis that were associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and triglycerides for further studies of cardiovascular disease in African Americans.
View details for DOI 10.1161/CIRCGENETICS.114.000481
View details for Web of Science ID 000349873200015
View details for PubMedID 25552592
View details for PubMedCentralID PMC4378661
-
Trans-ethnic meta-analysis of white blood cell phenotypes
HUMAN MOLECULAR GENETICS
2014; 23 (25): 6944–60
Abstract
White blood cell (WBC) count is a common clinical measure used as a predictor of certain aspects of human health, including immunity and infection status. WBC count is also a complex trait that varies among individuals and ancestry groups. Differences in linkage disequilibrium structure and heterogeneity in allelic effects are expected to play a role in the associations observed between populations. Prior genome-wide association study (GWAS) meta-analyses have identified genomic loci associated with WBC and its subtypes, but much of the heritability of these phenotypes remains unexplained. Using GWAS summary statistics for over 50 000 individuals from three diverse populations (Japanese, African-American and European ancestry), a Bayesian model methodology was employed to account for heterogeneity between ancestry groups. This approach was used to perform a trans-ethnic meta-analysis of total WBC, neutrophil and monocyte counts. Ten previously known associations were replicated and six new loci were identified, including several regions harboring genes related to inflammation and immune cell function. Ninety-five percent credible interval regions were calculated to narrow the association signals and fine-map the putatively causal variants within loci. Finally, a conditional analysis was performed on the most significant SNPs identified by the trans-ethnic meta-analysis (MA), and nine secondary signals within loci previously associated with WBC or its subtypes were identified. This work illustrates the potential of trans-ethnic analysis and ascribes a critical role to multi-ethnic cohorts and consortia in exploring complex phenotypes with respect to variants that lie outside the European-biased GWAS pool.
View details for PubMedID 25096241
View details for PubMedCentralID PMC4245044
-
Genome-wide association and admixture analysis of glaucoma in the Women's Health Initiative
HUMAN MOLECULAR GENETICS
2014; 23 (24): 6634-6643
Abstract
We report a genome-wide association study (GWAS) and admixture analysis of glaucoma in 12 008 African-American and Hispanic women (age 50-79 years) from the Women's Health Initiative (WHI). Although GWAS of glaucoma have been conducted on several populations, this is the first to look at glaucoma in individuals of African-American and Hispanic race/ethnicity. Prevalent and incident glaucoma was determined by self-report from study questionnaires administered at baseline (1993-1998) and annually through 2005. For African Americans, there was a total of 658 prevalent cases, 1062 incident cases and 6067 individuals who never progressed to glaucoma. For our replication cohort, we used the WHI Hispanics, including 153 prevalent cases, 336 incident cases and 2685 non-cases. We found an association of African ancestry with glaucoma incidence in African Americans (hazards ratio 1.62, 95% CI 1.023-2.56, P = 0.038) and in Hispanics (hazards ratio 3.21, 95% CI 1.32-7.80, P = 0.011). Although we found that no previously identified glaucoma SNPs replicated in either the WHI African Americans or Hispanics, a risk score combining all previously reported hits was significant in African-American prevalent cases (P = 0.0046), and was in the expected direction in the incident cases, as well as in the Hispanic incident cases. Additionally, after imputing to 1000 Genomes, two less common independent SNPs were suggestive in African Americans, but had too low of an allele frequency in Hispanics to test for replication. These results suggest the possibility of a distinct genetic architecture underlying glaucoma in individuals of African ancestry.
View details for DOI 10.1093/hmg/ddu364
View details for PubMedID 25027321
-
Leveraging population admixture to characterize the heritability of complex traits
NATURE GENETICS
2014; 46 (12): 1356-1362
Abstract
Despite recent progress on estimating the heritability explained by genotyped SNPs (h(2)g), a large gap between h(2)g and estimates of total narrow-sense heritability (h(2)) remains. Explanations for this gap include rare variants or upward bias in family-based estimates of h(2) due to shared environment or epistasis. We estimate h(2) from unrelated individuals in admixed populations by first estimating the heritability explained by local ancestry (h(2)γ). We show that h(2)γ = 2FSTCθ(1 - θ)h(2), where FSTC measures frequency differences between populations at causal loci and θ is the genome-wide ancestry proportion. Our approach is not susceptible to biases caused by epistasis or shared environment. We applied this approach to the analysis of 13 phenotypes in 21,497 African-American individuals from 3 cohorts. For height and body mass index (BMI), we obtained h(2) estimates of 0.55 ± 0.09 and 0.23 ± 0.06, respectively, which are larger than estimates of h(2)g in these and other data but smaller than family-based estimates of h(2).
View details for DOI 10.1038/ng.3139
View details for Web of Science ID 000345547300019
View details for PubMedCentralID PMC4244251
-
Leveraging population admixture to characterize the heritability of complex traits.
Nature genetics
2014; 46 (12): 1356-1362
Abstract
Despite recent progress on estimating the heritability explained by genotyped SNPs (h(2)g), a large gap between h(2)g and estimates of total narrow-sense heritability (h(2)) remains. Explanations for this gap include rare variants or upward bias in family-based estimates of h(2) due to shared environment or epistasis. We estimate h(2) from unrelated individuals in admixed populations by first estimating the heritability explained by local ancestry (h(2)γ). We show that h(2)γ = 2FSTCθ(1 - θ)h(2), where FSTC measures frequency differences between populations at causal loci and θ is the genome-wide ancestry proportion. Our approach is not susceptible to biases caused by epistasis or shared environment. We applied this approach to the analysis of 13 phenotypes in 21,497 African-American individuals from 3 cohorts. For height and body mass index (BMI), we obtained h(2) estimates of 0.55 ± 0.09 and 0.23 ± 0.06, respectively, which are larger than estimates of h(2)g in these and other data but smaller than family-based estimates of h(2).
View details for DOI 10.1038/ng.3139
View details for PubMedID 25383972
View details for PubMedCentralID PMC4244251
-
The Association of the Vanin-1 N131S Variant with Blood Pressure Is Mediated by Endoplasmic Reticulum-Associated Degradation and Loss of Function
PLOS GENETICS
2014; 10 (9)
Abstract
High blood pressure (BP) is the most common cardiovascular risk factor worldwide and a major contributor to heart disease and stroke. We previously discovered a BP-associated missense SNP (single nucleotide polymorphism)-rs2272996-in the gene encoding vanin-1, a glycosylphosphatidylinositol (GPI)-anchored membrane pantetheinase. In the present study, we first replicated the association of rs2272996 and BP traits with a total sample size of nearly 30,000 individuals from the Continental Origins and Genetic Epidemiology Network (COGENT) of African Americans (P=0.01). This association was further validated using patient plasma samples; we observed that the N131S mutation is associated with significantly lower plasma vanin-1 protein levels. We observed that the N131S vanin-1 is subjected to rapid endoplasmic reticulum-associated degradation (ERAD) as the underlying mechanism for its reduction. Using HEK293 cells stably expressing vanin-1 variants, we showed that N131S vanin-1 was degraded significantly faster than wild type (WT) vanin-1. Consequently, there were only minimal quantities of variant vanin-1 present on the plasma membrane and greatly reduced pantetheinase activity. Application of MG-132, a proteasome inhibitor, resulted in accumulation of ubiquitinated variant protein. A further experiment demonstrated that atenolol and diltiazem, two current drugs for treating hypertension, reduce the vanin-1 protein level. Our study provides strong biological evidence for the association of the identified SNP with BP and suggests that vanin-1 misfolding and degradation are the underlying molecular mechanism.
View details for DOI 10.1371/journal.pgen.1004641
View details for Web of Science ID 000343009600051
View details for PubMedCentralID PMC4169380
-
The association of the vanin-1 N131S variant with blood pressure is mediated by endoplasmic reticulum-associated degradation and loss of function.
PLoS genetics
2014; 10 (9): e1004641
Abstract
High blood pressure (BP) is the most common cardiovascular risk factor worldwide and a major contributor to heart disease and stroke. We previously discovered a BP-associated missense SNP (single nucleotide polymorphism)-rs2272996-in the gene encoding vanin-1, a glycosylphosphatidylinositol (GPI)-anchored membrane pantetheinase. In the present study, we first replicated the association of rs2272996 and BP traits with a total sample size of nearly 30,000 individuals from the Continental Origins and Genetic Epidemiology Network (COGENT) of African Americans (P=0.01). This association was further validated using patient plasma samples; we observed that the N131S mutation is associated with significantly lower plasma vanin-1 protein levels. We observed that the N131S vanin-1 is subjected to rapid endoplasmic reticulum-associated degradation (ERAD) as the underlying mechanism for its reduction. Using HEK293 cells stably expressing vanin-1 variants, we showed that N131S vanin-1 was degraded significantly faster than wild type (WT) vanin-1. Consequently, there were only minimal quantities of variant vanin-1 present on the plasma membrane and greatly reduced pantetheinase activity. Application of MG-132, a proteasome inhibitor, resulted in accumulation of ubiquitinated variant protein. A further experiment demonstrated that atenolol and diltiazem, two current drugs for treating hypertension, reduce the vanin-1 protein level. Our study provides strong biological evidence for the association of the identified SNP with BP and suggests that vanin-1 misfolding and degradation are the underlying molecular mechanism.
View details for DOI 10.1371/journal.pgen.1004641
View details for PubMedID 25233454
View details for PubMedCentralID PMC4169380
-
Modeling 3D facial shape from DNA.
PLoS genetics
2014; 10 (3)
Abstract
Human facial diversity is substantial, complex, and largely scientifically unexplained. We used spatially dense quasi-landmarks to measure face shape in population samples with mixed West African and European ancestry from three locations (United States, Brazil, and Cape Verde). Using bootstrapped response-based imputation modeling (BRIM), we uncover the relationships between facial variation and the effects of sex, genomic ancestry, and a subset of craniofacial candidate genes. The facial effects of these variables are summarized as response-based imputed predictor (RIP) variables, which are validated using self-reported sex, genomic ancestry, and observer-based facial ratings (femininity and proportional ancestry) and judgments (sex and population group). By jointly modeling sex, genomic ancestry, and genotype, the independent effects of particular alleles on facial features can be uncovered. Results on a set of 20 genes showing significant effects on facial features provide support for this approach as a novel means to identify genes affecting normal-range facial features and for approximating the appearance of a face from genetic markers.
View details for DOI 10.1371/journal.pgen.1004224
View details for PubMedID 24651127
View details for PubMedCentralID PMC3961191
-
Modeling 3D Facial Shape from DNA.
PLoS genetics
2014; 10 (3)
View details for DOI 10.1371/journal.pgen.1004224
View details for PubMedID 24651127
-
Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol.
American journal of human genetics
2014; 94 (2): 233-245
Abstract
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.
View details for DOI 10.1016/j.ajhg.2014.01.010
View details for PubMedID 24507775
-
A Variational Bayes Discrete Mixture Test for Rare Variant Association
GENETIC EPIDEMIOLOGY
2014; 38 (1): 21-30
Abstract
Recently, many statistical methods have been proposed to test for associations between rare genetic variants and complex traits. Most of these methods test for association by aggregating genetic variations within a predefined region, such as a gene. Although there is evidence that "aggregate" tests are more powerful than the single marker test, these tests generally ignore neutral variants and therefore are unable to identify specific variants driving the association with phenotype. We propose a novel aggregate rare-variant test that explicitly models a fraction of variants as neutral, tests associations at the gene-level, and infers the rare-variants driving the association. Simulations show that in the practical scenario where there are many variants within a given region of the genome with only a fraction causal our approach has greater power compared to other popular tests such as the Sequence Kernel Association Test (SKAT), the Weighted Sum Statistic (WSS), and the collapsing method of Morris and Zeggini (MZ). Our algorithm leverages a fast variational Bayes approximate inference methodology to scale to exome-wide analyses, a significant computational advantage over exact inference model selection methodologies. To demonstrate the efficacy of our methodology we test for associations between von Willebrand Factor (VWF) levels and VWF missense rare-variants imputed from the National Heart, Lung, and Blood Institute's Exome Sequencing project into 2,487 African Americans within the VWF gene. Our method suggests that a relatively small fraction (~10%) of the imputed rare missense variants within VWF are strongly associated with lower VWF levels in African Americans.
View details for DOI 10.1002/gepi.21772
View details for Web of Science ID 000328463200003
View details for PubMedID 24482836
-
ACCURATE CONSTRUCTION OF LONG RANGE HAPLOTYPE IN UNRELATED INDIVIDUALS
STATISTICA SINICA
2013; 23 (4): 1441-1461
View details for DOI 10.5705/ss.2012.141s
View details for Web of Science ID 000339125900002
-
Multiethnic Meta-Analysis of Genome-Wide Association Studies in > 100 000 Subjects Identifies 23 FibrinogenAssociated Loci but No Strong Evidence of a Causal Association Between Circulating Fibrinogen and Cardiovascular Disease
CIRCULATION
2013; 128 (12): 1310-1324
Abstract
Estimates of the heritability of plasma fibrinogen concentration, an established predictor of cardiovascular disease, range from 34% to 50%. Genetic variants so far identified by genome-wide association studies explain only a small proportion (<2%) of its variation.We conducted a meta-analysis of 28 genome-wide association studies including >90 000 subjects of European ancestry, the first genome-wide association meta-analysis of fibrinogen levels in 7 studies in blacks totaling 8289 samples, and a genome-wide association study in Hispanics totaling 1366 samples. Evaluation for association of single-nucleotide polymorphisms with clinical outcomes included a total of 40 695 cases and 85 582 controls for coronary artery disease, 4752 cases and 24 030 controls for stroke, and 3208 cases and 46 167 controls for venous thromboembolism. Overall, we identified 24 genome-wide significant (P<5×10(-8)) independent signals in 23 loci, including 15 novel associations, together accounting for 3.7% of plasma fibrinogen variation. Gene-set enrichment analysis highlighted key roles in fibrinogen regulation for the 3 structural fibrinogen genes and pathways related to inflammation, adipocytokines, and thyrotrophin-releasing hormone signaling. Whereas lead single-nucleotide polymorphisms in a few loci were significantly associated with coronary artery disease, the combined effect of all 24 fibrinogen-associated lead single-nucleotide polymorphisms was not significant for coronary artery disease, stroke, or venous thromboembolism.We identify 23 robustly associated fibrinogen loci, 15 of which are new. Clinical outcome analysis of these loci does not support a causal relationship between circulating levels of fibrinogen and coronary artery disease, stroke, or venous thromboembolism.
View details for DOI 10.1161/CIRCULATIONAHA.113.002251
View details for Web of Science ID 000324477900015
View details for PubMedID 23969696
-
Genome-wide Association Analysis of Blood-Pressure Traits in African-Ancestry Individuals Reveals Common Associated Genes in African and Non-African Populations
AMERICAN JOURNAL OF HUMAN GENETICS
2013; 93 (3): 545-554
Abstract
High blood pressure (BP) is more prevalent and contributes to more severe manifestations of cardiovascular disease (CVD) in African Americans than in any other United States ethnic group. Several small African-ancestry (AA) BP genome-wide association studies (GWASs) have been published, but their findings have failed to replicate to date. We report on a large AA BP GWAS meta-analysis that includes 29,378 individuals from 19 discovery cohorts and subsequent replication in additional samples of AA (n = 10,386), European ancestry (EA) (n = 69,395), and East Asian ancestry (n = 19,601). Five loci (EVX1-HOXA, ULK4, RSPO3, PLEKHG1, and SOX6) reached genome-wide significance (p < 1.0 × 10(-8)) for either systolic or diastolic BP in a transethnic meta-analysis after correction for multiple testing. Three of these BP loci (EVX1-HOXA, RSPO3, and PLEKHG1) lack previous associations with BP. We also identified one independent signal in a known BP locus (SOX6) and provide evidence for fine mapping in four additional validated BP loci. We also demonstrate that validated EA BP GWAS loci, considered jointly, show significant effects in AA samples. Consequently, these findings suggest that BP loci might have universal effects across studied populations, demonstrating that multiethnic samples are an essential component in identifying, fine mapping, and understanding their trait variability.
View details for DOI 10.1016/j.ajhg.2013.07.010
View details for Web of Science ID 000330268900014
View details for PubMedID 23972371
View details for PubMedCentralID PMC3769920
-
African American race but not genome-wide ancestry is negatively associated with atrial fibrillation among postmenopausal women in the Women's Health Initiative.
American heart journal
2013; 166 (3): 566-572
Abstract
Atrial fibrillation (AF) is the most common arrhythmia in women and is associated with higher rates of stroke and death. Rates of AF are lower in African American subjects compared with European Americans, suggesting European ancestry could contribute to AF risk.The Women's Health Initiative (WHI) Observational Study (OS) followed up 93,676 women since the mid 1990s for various cardiovascular outcomes including AF. Multivariate Cox hazard regression analysis was used to measure the association between African American race and incident AF. A total of 8,119 African American women from the WHI randomized clinical trials and OS were genotyped on the Affymetrix Human SNP Array 6.0. Genome-wide ancestry and previously reported single nucleotide polymorphisms associated with AF in European cohorts were tested for association with AF using multivariate logistic regression analyses.Self-reported African American race was associated with lower rates of AF (hazard ratio 0.43, 95% CI 0.32-0.60) in the OS, independent of demographic and clinical risk factors. In the genotyped cohort, there were 558 women with AF. By contrast, genome-wide European ancestry was not associated with AF. None of the single nucleotide polymorphisms previously associated with AF in European populations, including rs2200733, were associated with AF in the WHI African American cohort.African American race is significantly and inversely correlated with AF in postmenopausal women. The etiology of this association remains unclear and may be related to unidentified environmental differences. Larger studies are necessary to identify genetic determinants of AF in African Americans.
View details for DOI 10.1016/j.ahj.2013.05.024
View details for PubMedID 24016508
-
African American race but not genome-wide ancestry is negatively associated with atrial fibrillation among postmenopausal women in the Women's Health Initiative.
American heart journal
2013; 166 (3): 566-572 e1
View details for DOI 10.1016/j.ahj.2013.05.024
View details for PubMedID 24016508
-
Variation and genetic control of protein abundance in humans
NATURE
2013; 499 (7456): 79-82
Abstract
Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.
View details for DOI 10.1038/nature12223
View details for Web of Science ID 000321285600037
View details for PubMedID 23676674
-
Genome-wide association analysis of red blood cell traits in African Americans: the COGENT Network
HUMAN MOLECULAR GENETICS
2013; 22 (12): 2529-2538
Abstract
Laboratory red blood cell (RBC) measurements are clinically important, heritable and differ among ethnic groups. To identify genetic variants that contribute to RBC phenotypes in African Americans (AAs), we conducted a genome-wide association study in up to ~16 500 AAs. The alpha-globin locus on chromosome 16pter [lead SNP rs13335629 in ITFG3 gene; P < 1E-13 for hemoglobin (Hgb), RBC count, mean corpuscular volume (MCV), MCH and MCHC] and the G6PD locus on Xq28 [lead SNP rs1050828; P < 1E - 13 for Hgb, hematocrit (Hct), MCV, RBC count and red cell distribution width (RDW)] were each associated with multiple RBC traits. At the alpha-globin region, both the common African 3.7 kb deletion and common single nucleotide polymorphisms (SNPs) appear to contribute independently to RBC phenotypes among AAs. In the 2p21 region, we identified a novel variant of PRKCE distinctly associated with Hct in AAs. In a genome-wide admixture mapping scan, local European ancestry at the 6p22 region containing HFE and LRRC16A was associated with higher Hgb. LRRC16A has been previously associated with the platelet count and mean platelet volume in AAs, but not with Hgb. Finally, we extended to AAs the findings of association of erythrocyte traits with several loci previously reported in Europeans and/or Asians, including CD164 and HBS1L-MYB. In summary, this large-scale genome-wide analysis in AAs has extended the importance of several RBC-associated genetic loci to AAs and identified allelic heterogeneity and pleiotropy at several previously known genetic loci associated with blood cell traits in AAs.
View details for DOI 10.1093/hmg/ddt087
View details for Web of Science ID 000319432800018
View details for PubMedID 23446634
View details for PubMedCentralID PMC3658166
-
Genome-wide Characterization of Shared and Distinct Genetic Components that Influence Blood Lipid Levels in Ethnically Diverse Human Populations
AMERICAN JOURNAL OF HUMAN GENETICS
2013; 92 (6): 904-916
Abstract
Blood lipid concentrations are heritable risk factors associated with atherosclerosis and cardiovascular diseases. Lipid traits exhibit considerable variation among populations of distinct ancestral origin as well as between individuals within a population. We performed association analyses to identify genetic loci influencing lipid concentrations in African American and Hispanic American women in the Women's Health Initiative SNP Health Association Resource. We validated one African-specific high-density lipoprotein cholesterol locus at CD36 as well as 14 known lipid loci that have been previously implicated in studies of European populations. Moreover, we demonstrate striking similarities in genetic architecture (loci influencing the trait, direction and magnitude of genetic effects, and proportions of phenotypic variation explained) of lipid traits across populations. In particular, we found that a disproportionate fraction of lipid variation in African Americans and Hispanic Americans can be attributed to genomic loci exhibiting statistical evidence of association in Europeans, even though the precise genes and variants remain unknown. At the same time, we found substantial allelic heterogeneity within shared loci, characterized both by population-specific rare variants and variants shared among multiple populations that occur at disparate frequencies. The allelic heterogeneity emphasizes the importance of including diverse populations in future genetic association studies of complex traits such as lipids; furthermore, the overlap in lipid loci across populations of diverse ancestral origin argues that additional knowledge can be gleaned from multiple populations.
View details for DOI 10.1016/j.ajhg.2013.04.025
View details for Web of Science ID 000320415300007
View details for PubMedCentralID PMC3675231
-
Association of DXA-derived Bone Mineral Density and Fat Mass With African Ancestry
JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM
2013; 98 (4): E713-E717
Abstract
Both genes and environment have been implicated in determining the complex body composition phenotypes in individuals of European ancestry; however, few studies have been conducted in other race/ethnic groups.We conducted a genome-wide admixture mapping study in an attempt to localize novel genomic regions associated with genetic ancestry.We selected a sample of 842 African-American women from the Women's Health Initiative single nucleotide polymorphism (SNP) Health Association Resource for whom several dual-energy X-ray absorptiometry (DXA)-derived bone mineral density (BMD) and fat mass phenotypes were available.We derived both global and local ancestry estimates for each individual from Affymetrix 6.0 data and analyzed the correlation of DXA phenotypes with global African ancestry. For each phenotype, we examined the association of local genetic ancestry (number of African ancestral alleles at each marker) and each DXA phenotype at 570 282 markers across the genome in additive models with adjustment for important covariates. Results: We identified statistically significant correlations of whole-body fat mass, trunk fat mass, and all 6 measures of BMD with a proportion of African ancestry. Genome-wide (admixture) significance for femoral neck BMD was achieved across 2 regions ∼3.7 MB and 0.3 MB on chromosome 19q13; similarly, total hip and intertrochanter BMD were associated with local ancestry in these regions. Trunk fat was the most significant fat mass phenotype showing strong, but not genomewide significant associations on chromosome Xp22.Our results suggest that genomic regions in postmenopausal African-American women contribute to variance in BMD and fat mass existence and warrant further study.
View details for DOI 10.1210/jc.2012-3921
View details for Web of Science ID 000317195600014
View details for PubMedID 23436924
View details for PubMedCentralID PMC3615193
-
Imputation of Exome Sequence Variants into Population-Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project
AMERICAN JOURNAL OF HUMAN GENETICS
2012; 91 (5): 794-808
Abstract
Researchers have successfully applied exome sequencing to discover causal variants in selected individuals with familial, highly penetrant disorders. We demonstrate the utility of exome sequencing followed by imputation for discovering low-frequency variants associated with complex quantitative traits. We performed exome sequencing in a reference panel of 761 African Americans and then imputed newly discovered variants into a larger sample of more than 13,000 African Americans for association testing with the blood cell traits hemoglobin, hematocrit, white blood count, and platelet count. First, we illustrate the feasibility of our approach by demonstrating genome-wide-significant associations for variants that are not covered by conventional genotyping arrays; for example, one such association is that between higher platelet count and an MPL c.117G>T (p.Lys39Asn) variant encoding a p.Lys39Asn amino acid substitution of the thrombopoietin receptor gene (p = 1.5 × 10(-11)). Second, we identified an association between missense variants of LCT and higher white blood count (p = 4 × 10(-13)). Third, we identified low-frequency coding variants that might account for allelic heterogeneity at several known blood cell-associated loci: MPL c.754T>C (p.Tyr252His) was associated with higher platelet count; CD36 c.975T>G (p.Tyr325(∗)) was associated with lower platelet count; and several missense variants at the α-globin gene locus were associated with lower hemoglobin. By identifying low-frequency missense variants associated with blood cell traits not previously reported by genome-wide association studies, we establish that exome sequencing followed by imputation is a powerful approach to dissecting complex, genetically heterogeneous traits in large population-based studies.
View details for DOI 10.1016/j.ajhg.2012.08.031
View details for Web of Science ID 000311011400003
View details for PubMedID 23103231
View details for PubMedCentralID PMC3487117
-
Genome-Wide Association Studies of Quantitatively Measured Skin, Hair, and Eye Pigmentation in Four European Populations
PLOS ONE
2012; 7 (10)
Abstract
Pigmentation of the skin, hair, and eyes varies both within and between human populations. Identifying the genes and alleles underlying this variation has been the goal of many candidate gene and several genome-wide association studies (GWAS). Most GWAS for pigmentary traits to date have been based on subjective phenotypes using categorical scales. But skin, hair, and eye pigmentation vary continuously. Here, we seek to characterize quantitative variation in these traits objectively and accurately and to determine their genetic basis. Objective and quantitative measures of skin, hair, and eye color were made using reflectance or digital spectroscopy in Europeans from Ireland, Poland, Italy, and Portugal. A GWAS was conducted for the three quantitative pigmentation phenotypes in 176 women across 313,763 SNP loci, and replication of the most significant associations was attempted in a sample of 294 European men and women from the same countries. We find that the pigmentation phenotypes are highly stratified along axes of European genetic differentiation. The country of sampling explains approximately 35% of the variation in skin pigmentation, 31% of the variation in hair pigmentation, and 40% of the variation in eye pigmentation. All three quantitative phenotypes are correlated with each other. In our two-stage association study, we reproduce the association of rs1667394 at the OCA2/HERC2 locus with eye color but we do not identify new genetic determinants of skin and hair pigmentation supporting the lack of major genes affecting skin and hair color variation within Europe and suggesting that not only careful phenotyping but also larger cohorts are required to understand the genetic architecture of these complex quantitative traits. Interestingly, we also see that in each of these four populations, men are more lightly pigmented in the unexposed skin of the inner arm than women, a fact that is underappreciated and may vary across the world.
View details for DOI 10.1371/journal.pone.0048294
View details for Web of Science ID 000310600500094
View details for PubMedID 23118974
View details for PubMedCentralID PMC3485197
-
Variants in CXADR and F2RL1 are associated with blood pressure and obesity in African-Americans in regions identified through admixture mapping
JOURNAL OF HYPERTENSION
2012; 30 (10): 1970-1976
Abstract
Genetic variants in 296 genes in regions identified through admixture mapping of hypertension, BMI, and lipids were assessed for association with hypertension, blood pressure (BP), BMI, and high-density lipoprotein cholesterol (HDL-C).This study identified coding SNPs identified from HapMap2 data that were located in genes on chromosomes 5, 6, 8, and 21, wherein ancestry association evidence for hypertension, BMI, or HDL-C was identified in previous admixture mapping studies. Genotyping was performed in 1733 unrelated African-Americans from the National Heart, Lung and Blood Institute's Family Blood Pressure Project, and gene-based association analyses were conducted for hypertension, SBP, DBP, BMI, and HDL-C. A gene score based on the number of minor alleles of each SNP in a gene was created and used for gene-based regression analyses, adjusting for age, age, sex, local marker ancestry, and BMI, as applicable. An individual's African ancestry estimated from 2507 ancestry-informative markers was also adjusted for to eliminate any confounding due to population stratification.CXADR (rs437470) on chromosome 21 was associated with SBP and DBP with or without adjusting for local ancestry (P < 0.0006). F2RL1 (rs631465) on chromosome 5 was associated with BMI (P = 0.0005). Local ancestry in these regions was associated with the respective traits as well.This study suggests that CXADR and F2RL1 likely play important roles in BP and obesity variation, respectively; and these findings are consistent with those of other studies, so replication and functional analyses are necessary.
View details for DOI 10.1097/HJH.0b013e3283578c80
View details for Web of Science ID 000308854500017
View details for PubMedID 22914544
View details for PubMedCentralID PMC3575678
-
Genome-wide Association and Population Genetic Analysis of C-Reactive Protein in African American and Hispanic American Women
AMERICAN JOURNAL OF HUMAN GENETICS
2012; 91 (3): 502-512
Abstract
C-reactive protein (CRP) is a systemic inflammation marker that predicts future cardiovascular risk. CRP levels are higher in African Americans and Hispanic Americans than in European Americans, but the genetic determinants of CRP in these admixed United States minority populations are largely unknown. We performed genome-wide association studies (GWASs) of 8,280 African American (AA) and 3,548 Hispanic American (HA) postmenopausal women from the Women's Health Initiative SNP Health Association Resource. We discovered and validated a CRP-associated variant of triggering receptors expressed by myeloid cells 2 (TREM2) in chromosomal region 6p21 (p = 10(-10)). The TREM2 variant associated with higher CRP is common in Africa but rare in other ancestral populations. In AA women, the CRP region in 1q23 contained a strong admixture association signal (p = 10(-17)), which appears to be related to several independent CRP-associated alleles; the strongest of these is present only in African ancestral populations and is associated with higher CRP. Of the other genomic loci previously associated with CRP through GWASs of European populations, most loci (LEPR, IL1RN, IL6R, GCKR, NLRP3, HNF1A, HNF4A, and APOC1) showed consistent patterns of association with CRP in AA and HA women. In summary, we have identified a common TREM2 variant associated with CRP in United States minority populations. The genetic architecture underlying the CRP phenotype in AA women is complex and involves genetic variants shared across populations, as well as variants specific to populations of African descent.
View details for DOI 10.1016/j.ajhg.2012.07.023
View details for Web of Science ID 000308683100010
View details for PubMedID 22939635
View details for PubMedCentralID PMC3511984
-
Estimating Kinship in Admixed Populations
AMERICAN JOURNAL OF HUMAN GENETICS
2012; 91 (1): 122-138
Abstract
Genome-wide association studies (GWASs) are commonly used for the mapping of genetic loci that influence complex traits. A problem that is often encountered in both population-based and family-based GWASs is that of identifying cryptic relatedness and population stratification because it is well known that failure to appropriately account for both pedigree and population structure can lead to spurious association. A number of methods have been proposed for identifying relatives in samples from homogeneous populations. A strong assumption of population homogeneity, however, is often untenable, and many GWASs include samples from structured populations. Here, we consider the problem of estimating relatedness in structured populations with admixed ancestry. We propose a method, REAP (relatedness estimation in admixed populations), for robust estimation of identity by descent (IBD)-sharing probabilities and kinship coefficients in admixed populations. REAP appropriately accounts for population structure and ancestry-related assortative mating by using individual-specific allele frequencies at SNPs that are calculated on the basis of ancestry derived from whole-genome analysis. In simulation studies with related individuals and admixture from highly divergent populations, we demonstrate that REAP gives accurate IBD-sharing probabilities and kinship coefficients. We apply REAP to the Mexican Americans in Los Angeles, California (MXL) population sample of release 3 of phase III of the International Haplotype Map Project; in this sample, we identify third- and fourth-degree relatives who have not previously been reported. We also apply REAP to the African American and Hispanic samples from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe) study, in which hundreds of pairs of cryptically related individuals have been identified.
View details for DOI 10.1016/j.ajhg.2012.05.024
View details for Web of Science ID 000306445000010
View details for PubMedID 22748210
View details for PubMedCentralID PMC3397261
-
Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes
CELL
2012; 148 (6): 1293-1307
Abstract
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.
View details for DOI 10.1016/j.cell.2012.02.009
View details for PubMedID 22424236
-
Human genetic variation altering anthrax toxin sensitivity
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (8): 2972-2977
Abstract
The outcome of exposure to infectious microbes or their toxins is influenced by both microbial and host genes. Some host genes encode defense mechanisms, whereas others assist pathogen functions. Genomic analyses have associated host gene mutations with altered infectious disease susceptibility, but evidence for causality is limited. Here we demonstrate that human genetic variation affecting capillary morphogenesis gene 2 (CMG2), which encodes a host membrane protein exploited by anthrax toxin as a principal receptor, dramatically alters toxin sensitivity. Lymphoblastoid cells derived from a HapMap Project cohort of 234 persons of African, European, or Asian ancestry differed in sensitivity mediated by the protective antigen (PA) moiety of anthrax toxin by more than four orders of magnitude, with 99% of the cohort showing a 250-fold range of sensitivity. We find that relative sensitivity is an inherited trait that correlates strongly with CMG2 mRNA abundance in cells of each ethnic/geographical group and in the combined population pool (P = 4 × 10(-11)). The extent of CMG2 expression in transfected murine macrophages and human lymphoblastoid cells affected anthrax toxin binding, internalization, and sensitivity. A CMG2 single-nucleotide polymorphism (SNP) occurring frequently in African and European populations independently altered toxin uptake, but was not statistically associated with altered sensitivity in HapMap cell populations. Our results reveal extensive human diversity in cell lethality dependent on PA-mediated toxin binding and uptake, and identify individual differences in CMG2 expression level as a determinant of this diversity. Testing of genomically characterized human cell populations may offer a broadly useful strategy for elucidating effects of genetic variation on infectious disease susceptibility.
View details for DOI 10.1073/pnas.1121006109
View details for Web of Science ID 000300495100062
View details for PubMedID 22315420
View details for PubMedCentralID PMC3286947
-
Genome-wide association study of body height in African Americans: the Women's Health Initiative SNP Health Association Resource (SHARe)
HUMAN MOLECULAR GENETICS
2012; 21 (3): 711-720
Abstract
Height is a complex trait under strong genetic influence. To date, numerous genetic loci have been associated with height in individuals of European ancestry. However, few large-scale discovery genome-wide association studies (GWAS) of height in minority populations have been conducted and thus information about population-specific height regulation is limited. We conducted a GWA analysis of height in 8149 African-American (AA) women from the Women's Health Initiative. Genetic variants with P< 5 × 10(-5) (n = 169) were followed up in a replication data set (n = 20 809) and meta-analyzed in a total of 28 958 AAs and African-descent individuals. Twelve single-nucleotide polymorphisms (SNPs) representing 7 independent loci were significantly associated with height at P < 5 × 10(-8). We identified novel SNPs in 17q23 (TMEM100/PCTP) and Xp22.3 (ARSE) reflecting population-specific regulation of height in AAs and replicated five loci previously reported in European-descent populations [4p15/LCORL, 11q13/SERPINH1, 12q14/HMGA2, 17q23/MAP3K3 (mitogen-activated protein kinase3) and 18q21/DYM]. In addition, we performed an admixture mapping analysis of height which is both complementary and supportive to the GWA analysis and suggests potential associations between ancestry and height on chromosomes 4 (4q21), 15 (15q26) and 17 (17q23). Our findings provide insight into the genetic architecture of height and support the investigation of non-European-descent populations for identifying genetic factors associated with complex traits. Specifically, we identify new loci that may reflect population-specific regulation of height and report several known height loci that are important in determining height in African-descent populations.
View details for DOI 10.1093/hmg/ddr489
View details for Web of Science ID 000299351000020
View details for PubMedID 22021425
View details for PubMedCentralID PMC3259012
-
Genome-Wide Association Study of White Blood Cell Count in 16,388 African Americans: the Continental Origins and Genetic Epidemiology Network (COGENT)
PLOS GENETICS
2011; 7 (6)
Abstract
Total white blood cell (WBC) and neutrophil counts are lower among individuals of African descent due to the common African-derived "null" variant of the Duffy Antigen Receptor for Chemokines (DARC) gene. Additional common genetic polymorphisms were recently associated with total WBC and WBC sub-type levels in European and Japanese populations. No additional loci that account for WBC variability have been identified in African Americans. In order to address this, we performed a large genome-wide association study (GWAS) of total WBC and cell subtype counts in 16,388 African-American participants from 7 population-based cohorts available in the Continental Origins and Genetic Epidemiology Network. In addition to the DARC locus on chromosome 1q23, we identified two other regions (chromosomes 4q13 and 16q22) associated with WBC in African Americans (P<2.5×10(-8)). The lead SNP (rs9131) on chromosome 4q13 is located in the CXCL2 gene, which encodes a chemotactic cytokine for polymorphonuclear leukocytes. Independent evidence of the novel CXCL2 association with WBC was present in 3,551 Hispanic Americans, 14,767 Japanese, and 19,509 European Americans. The index SNP (rs12149261) on chromosome 16q22 associated with WBC count is located in a large inter-chromosomal segmental duplication encompassing part of the hydrocephalus inducing homolog (HYDIN) gene. We demonstrate that the chromosome 16q22 association finding is most likely due to a genotyping artifact as a consequence of sequence similarity between duplicated regions on chromosomes 16q22 and 1q21. Among the WBC loci recently identified in European or Japanese populations, replication was observed in our African-American meta-analysis for rs445 of CDK6 on chromosome 7q21 and rs4065321 of PSMD3-CSF3 region on chromosome 17q21. In summary, the CXCL2, CDK6, and PSMD3-CSF3 regions are associated with WBC count in African American and other populations. We also demonstrate that large inter-chromosomal duplications can result in false positive associations in GWAS.
View details for DOI 10.1371/journal.pgen.1002108
View details for Web of Science ID 000292386300027
View details for PubMedID 21738479
View details for PubMedCentralID PMC3128101
-
Joint Testing of Genotype and Ancestry Association in Admixed Families
GENETIC EPIDEMIOLOGY
2010; 34 (8): 783-791
Abstract
Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.
View details for DOI 10.1002/gepi.20520
View details for Web of Science ID 000284719100002
View details for PubMedID 21031451
View details for PubMedCentralID PMC3103820
-
Lack of Association Between the Trp719Arg Polymorphism in Kinesin-Like Protein-6 and Coronary Artery Disease in 19 Case-Control Studies
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY
2010; 56 (19): 1552-1563
Abstract
We sought to replicate the association between the kinesin-like protein 6 (KIF6) Trp719Arg polymorphism (rs20455), and clinical coronary artery disease (CAD).Recent prospective studies suggest that carriers of the 719Arg allele in KIF6 are at increased risk of clinical CAD compared with noncarriers.The KIF6 Trp719Arg polymorphism (rs20455) was genotyped in 19 case-control studies of nonfatal CAD either as part of a genome-wide association study or in a formal attempt to replicate the initial positive reports.A total of 17,000 cases and 39,369 controls of European descent as well as a modest number of South Asians, African Americans, Hispanics, East Asians, and admixed cases and controls were successfully genotyped. None of the 19 studies demonstrated an increased risk of CAD in carriers of the 719Arg allele compared with noncarriers. Regression analyses and fixed-effects meta-analyses ruled out with high degree of confidence an increase of ≥2% in the risk of CAD among European 719Arg carriers. We also observed no increase in the risk of CAD among 719Arg carriers in the subset of Europeans with early-onset disease (younger than 50 years of age for men and younger than 60 years of age for women) compared with similarly aged controls as well as all non-European subgroups.The KIF6 Trp719Arg polymorphism was not associated with the risk of clinical CAD in this large replication study.
View details for DOI 10.1016/j.jacc.2010.06.022
View details for PubMedID 20933357
-
Detecting Genomic Aberrations Using Products in a Multiscale Analysis
BIOMETRICS
2010; 66 (3): 684-693
Abstract
Genomic instability, such as copy-number losses and gains, occurs in many genetic diseases. Recent technology developments enable researchers to measure copy numbers at tens of thousands of markers simultaneously. In this article, we propose a nonparametric approach for detecting the locations of copy-number changes and provide a measure of significance for each change point. The proposed test is based on seeking scale-based changes in the sequence of copy numbers, which is ordered by the marker locations along the chromosome. The method leads to a natural way to estimate the null distribution for the test of a change point and adjusted p-values for the significance of a change point using a step-down maxT permutation algorithm to control the family-wise error rate. A simulation study investigates the finite sample performance of the proposed method and compares it with a more standard sequential testing method. The method is illustrated using two real data sets.
View details for DOI 10.1111/j.1541-0420.2009.01337.x
View details for Web of Science ID 000281950000003
View details for PubMedID 19817738
View details for PubMedCentralID PMC2942992
-
Genome-Wide Association Study Implicates Chromosome 9q21.31 as a Susceptibility Locus for Asthma in Mexican Children
PLOS GENETICS
2009; 5 (8)
Abstract
Many candidate genes have been studied for asthma, but replication has varied. Novel candidate genes have been identified for various complex diseases using genome-wide association studies (GWASs). We conducted a GWAS in 492 Mexican children with asthma, predominantly atopic by skin prick test, and their parents using the Illumina HumanHap 550 K BeadChip to identify novel genetic variation for childhood asthma. The 520,767 autosomal single nucleotide polymorphisms (SNPs) passing quality control were tested for association with childhood asthma using log-linear regression with a log-additive risk model. Eleven of the most significantly associated GWAS SNPs were tested for replication in an independent study of 177 Mexican case-parent trios with childhood-onset asthma and atopy using log-linear analysis. The chromosome 9q21.31 SNP rs2378383 (p = 7.10x10(-6) in the GWAS), located upstream of transducin-like enhancer of split 4 (TLE4), gave a p-value of 0.03 and the same direction and magnitude of association in the replication study (combined p = 6.79x10(-7)). Ancestry analysis on chromosome 9q supported an inverse association between the rs2378383 minor allele (G) and childhood asthma. This work identifies chromosome 9q21.31 as a novel susceptibility locus for childhood asthma in Mexicans. Further, analysis of genome-wide expression data in 51 human tissues from the Novartis Research Foundation showed that median GWAS significance levels for SNPs in genes expressed in the lung differed most significantly from genes not expressed in the lung when compared to 50 other tissues, supporting the biological plausibility of our overall GWAS findings and the multigenic etiology of childhood asthma.
View details for DOI 10.1371/journal.pgen.1000623
View details for Web of Science ID 000271533500037
View details for PubMedID 19714205
View details for PubMedCentralID PMC2722731
-
Admixture mapping of quantitative trait loci for blood lipids in African-Americans
HUMAN MOLECULAR GENETICS
2009; 18 (11): 2091-2098
Abstract
Blood lipid levels, including low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG), are highly heritable traits and major risk factors for atherosclerotic cardiovascular disease (CVD). Using individual ancestry estimates at marker locations across the genome, we present a novel quantitative admixture mapping analysis of all three lipid traits in a large sample of African-Americans from the Family Blood Pressure Program. Regression analysis was performed with both total and marker-location-specific European ancestry as explanatory variables, along with demographic covariates. Robust permutation analysis was used to assess statistical significance. Overall European ancestry was significantly correlated with HDL-C (negatively) and TG (positively), but not with LDL-C. We found strong evidence for a novel locus underlying HDL-C on chromosome 8q, which correlated negatively with European ancestry (P = .0014); the same location also showed positive correlation of European ancestry with TG levels. A region on chromosome 14q also showed significant negative correlation between HDL-C levels and European ancestry. On chromosome 15q, a suggestive negative correlation of European ancestry with TG and positive correlation with HDL-C was observed. Results with LDL-C were less significant overall. We also found significant evidence for genome-wide ancestry effects underlying the joint distribution of HDL-C and TG, not fully explained by the locus on chromosome 8. Our results are consistent with a genetic contribution to and may explain the healthier HDL-C and TG profiles found in Blacks versus Whites. The identified regions provide locations for follow-up studies of genetic variants underlying lipid variation in African-Americans and possibly other populations.
View details for DOI 10.1093/hmg/ddp122
View details for Web of Science ID 000265951600018
View details for PubMedID 19304782
View details for PubMedCentralID PMC2722229
-
Admixture Mapping of Quantitative Trait Loci for BMI in African Americans: Evidence for Loci on Chromosomes 3q, 5q, and 15q
OBESITY
2009; 17 (6): 1226-1231
Abstract
Obesity is a heritable trait and a major risk factor for highly prevalent common diseases such as hypertension and type 2 diabetes. Previously we showed that BMI was positively correlated with African ancestry among the African Americans (AAs) in the US National Heart, Lung, and Blood Institute's Family Blood Pressure Program (FBPP). In a set of 1,344 unrelated AAs, using Individual Ancestry (IA) estimates at 284 marker locations across the genome, we now present a quantitative admixture mapping analysis of BMI. We used a set of unrelated individuals from Nigeria to represent the African ancestral population and the European American (EA) in the FBPP as the European ancestral population. The analysis was based on a common set of 284 microsatellite markers genotyped in all three groups. We considered the quantitative trait, BMI, as the response variable in a regression analysis with the marker location specific excess European ancestry as the explanatory variable. After suitably adjusting for different covariates such as sex, age, and network, we found strong evidence for a positive association with European ancestry at chromosome locations 3q29 and 5q14 and a negative association on chromosome 15q26. To our knowledge, this is the largest quantitative admixture mapping effort in terms of sample size and marker locus involvement for the trait. These results suggest that these regions may harbor genes influencing BMI in the AA population.
View details for DOI 10.1038/oby.2009.24
View details for Web of Science ID 000266383200021
View details for PubMedID 19584881
View details for PubMedCentralID PMC2929755
-
Molecular and Evolutionary History of Melanism in North American Gray Wolves
SCIENCE
2009; 323 (5919): 1339-1343
Abstract
Morphological diversity within closely related species is an essential aspect of evolution and adaptation. Mutations in the Melanocortin 1 receptor (Mc1r) gene contribute to pigmentary diversity in natural populations of fish, birds, and many mammals. However, melanism in the gray wolf, Canis lupus, is caused by a different melanocortin pathway component, the K locus, that encodes a beta-defensin protein that acts as an alternative ligand for Mc1r. We show that the melanistic K locus mutation in North American wolves derives from past hybridization with domestic dogs, has risen to high frequency in forested habitats, and exhibits a molecular signature of positive selection. The same mutation also causes melanism in the coyote, Canis latrans, and in Italian gray wolves, and hence our results demonstrate how traits selected in domesticated species can influence the morphological diversity of their wild relatives.
View details for DOI 10.1126/science.1165448
View details for Web of Science ID 000263876700041
View details for PubMedID 19197024
View details for PubMedCentralID PMC2903542
-
Characterizing the admixed African ancestry of African Americans
GENOME BIOLOGY
2009; 10 (12)
Abstract
Accurate, high-throughput genotyping allows the fine characterization of genetic ancestry. Here we applied recently developed statistical and computational techniques to the question of African ancestry in African Americans by using data on more than 450,000 single-nucleotide polymorphisms (SNPs) genotyped in 94 Africans of diverse geographic origins included in the HGDP, as well as 136 African Americans and 38 European Americans participating in the Atherosclerotic Disease Vascular Function and Genetic Epidemiology (ADVANCE) study. To focus on African ancestry, we reduced the data to include only those genotypes in each African American determined statistically to be African in origin.From cluster analysis, we found that all the African Americans are admixed in their African components of ancestry, with the majority contributions being from West and West-Central Africa, and only modest variation in these African-ancestry proportions among individuals. Furthermore, by principal components analysis, we found little evidence of genetic structure within the African component of ancestry in African Americans.These results are consistent with historic mating patterns among African Americans that are largely uncorrelated to African ancestral origins, and they cast doubt on the general utility of mtDNA or Y-chromosome markers alone to delineate the full African ancestry of African Americans. Our results also indicate that the genetic architecture of African Americans is distinct from that of Africans, and that the greatest source of potential genetic stratification bias in case-control studies of African Americans derives from the proportion of European ancestry.
View details for DOI 10.1186/gb-2009-10-12-r141
View details for Web of Science ID 000274289000011
View details for PubMedID 20025784
View details for PubMedCentralID PMC2812948
-
Worldwide patterns of haplotype diversity at 9p21.3, a locus associated with type 2 diabetes and coronary heart disease
GENOME MEDICINE
2009; 1
View details for DOI 10.1186/gm51
View details for Web of Science ID 000208627000051
-
Genome-wide distribution of ancestry in Mexican Americans
HUMAN GENETICS
2008; 124 (3): 207-214
Abstract
Migrations to the new world brought together individuals from Europe, Africa and the Americans. Inter-mating between these migrant and indigenous populations led to the subsequent formation of new admixed populations, such as African and Latino Americans. These unprecedented events brought together genomes that had evolved independently on different continents for tens of thousands of years and presented new environmental challenges for the indigenous and migrant populations, as well as their offspring. These circumstances provided novel opportunities for natural selection to occur that could be reflected in deviations at specific locations from the genome-wide ancestry distribution. Here we present an analysis examining European, Native American and African ancestry based on 284 microsatellite markers in a study of Mexican Americans from the Family Blood Pressure Program. We identified two genomic regions where there was a significant decrement in African ancestry (at 2p25.1, p < 10(-8) and 9p24.1, p < 2 x 10(-5)) and one region with a significant increase in European ancestry (at 1p33, p < 2 x 10(-5)). These locations may harbor genes that have been subjected to natural selection after the ancestral mixing giving rise to Mexicans.
View details for DOI 10.1007/s00439-008-0541-5
View details for Web of Science ID 000259909500002
View details for PubMedID 18752003
View details for PubMedCentralID PMC3131689
-
Susceptibility locus for clinical and subclinical coronary artery disease at chromosome 9p21 in the multi-ethnic ADVANCE study
HUMAN MOLECULAR GENETICS
2008; 17 (15): 2320-2328
Abstract
A susceptibility locus for coronary artery disease (CAD) at chromosome 9p21 has recently been reported, which may influence the age of onset of CAD. We sought to replicate these findings among white subjects and to examine whether these results are consistent with other racial/ethnic groups by genotyping three single nucleotide polymorphisms (SNPs) in the risk interval in the Atherosclerotic Disease, Vascular Function, and Genetic Epidemiology (ADVANCE) study. One or more of these SNPs was associated with clinical CAD in whites, U.S. Hispanics and U.S. East Asians. None of the SNPs were associated with CAD in African Americans although the power to detect an odds ratio (OR) in this group equivalent to that seen in whites was only 24-30%. ORs were higher in Hispanics and East Asians and lower in African Americans, but in all groups the 95% confidence intervals overlapped with ORs observed in whites. High-risk alleles were also associated with increased coronary artery calcification in controls and the magnitude of these associations by racial/ethnic group closely mirrored the magnitude observed for clinical CAD. Unexpectedly, we noted significant genotype frequency differences between male and female cases (P = 0.003-0.05). Consequently, men tended towards a recessive and women tended towards a dominant mode of inheritance. Finally, an effect of genotype on the age of onset of CAD was detected but only in men carrying two versus one or no copy of the high-risk allele and presenting with CAD at age >50 years. Further investigations in other populations are needed to confirm or refute our findings.
View details for DOI 10.1093/hmg/ddn132
View details for Web of Science ID 000257788300007
View details for PubMedID 18443000
View details for PubMedCentralID PMC2733811
-
Admixture Mapping and the Role of Population Structure for Localizing Disease Genes
GENETIC DISSECTION OF COMPLEX TRAITS, 2ND EDITION
2008; 60: 547-569
Abstract
Admixture mapping, or mapping by admixture linkage disequilibrium, is a disease mapping strategy that has gained considerable popularity in recent years. It exploits the long-range linkage disequilibrium generated by admixture between genetically distinct ancestral populations. Compared to case-control association designs, admixture mapping requires fewer markers, and is more robust to allelic heterogeneity. At the same time, admixture mapping can be more powerful, and can achieve higher mapping resolution than traditional linkage studies, provided that the underlying trait variants occur at sufficiently different frequencies in the ancestral populations. In this chapter, we describe the recent methodology and software development, review successful applications, and comment on the future of this approach.
View details for DOI 10.1016/S0065-2660(07)00419-1
View details for Web of Science ID 000280575900021
View details for PubMedID 18358332
-
IMPROVING POPULATION-SPECIFIC ALLELE FREQUENCY ESTIMATES BY ADAPTING SUPPLEMENTAL DATA: AN EMPIRICAL BAYES APPROACH.
The annals of applied statistics
2007; 1 (2): 459-479
Abstract
Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.
View details for PubMedID 21451739
View details for PubMedCentralID PMC3065192
-
IMPROVING POPULATION-SPECIFIC ALLELE FREQUENCY ESTIMATES BY ADAPTING SUPPLEMENTAL DATA: AN EMPIRICAL BAYES APPROACH
ANNALS OF APPLIED STATISTICS
2007; 1 (2): 459-479
Abstract
Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.
View details for DOI 10.1214/07-AOAS121
View details for Web of Science ID 000261057600010
View details for PubMedCentralID PMC3065192
-
Recent genetic selection in the ancestral admixture of Puerto Ricans
AMERICAN JOURNAL OF HUMAN GENETICS
2007; 81 (3): 626-633
Abstract
Recent studies have used dense markers to examine the human genome in ancestrally homogeneous populations for hallmarks of selection. No genomewide studies have focused on recently admixed groups--populations that have experienced admixing among continentally divided ancestral populations within the past 200-500 years. New World admixed populations are unique in that they represent the sudden confluence of geographically diverged genomes with novel environmental challenges. Here, we present a novel approach for studying selection by examining the genomewide distribution of ancestry in the genetically admixed Puerto Ricans. We find strong statistical evidence of recent selection in three chromosomal regions, including the human leukocyte antigen region on chromosome 6p, chromosome 8q, and chromosome 11q. Two of these regions harbor genes for olfactory receptors. Interestingly, all three regions exhibit deficiencies in the European-ancestry proportion.
View details for DOI 10.1086/520769
View details for Web of Science ID 000249128200019
View details for PubMedID 17701908
View details for PubMedCentralID PMC1950843
-
A statistical method for chromatographic alignment of LC-MS data
BIOSTATISTICS
2007; 8 (2): 357-367
Abstract
Integrated liquid-chromatography mass-spectrometry (LC-MS) is becoming a widely used approach for quantifying the protein composition of complex samples. The output of the LC-MS system measures the intensity of a peptide with a specific mass-charge ratio and retention time. In the last few years, this technology has been used to compare complex biological samples across multiple conditions. One challenge for comparative proteomic profiling with LC-MS is to match corresponding peptide features from different experiments. In this paper, we propose a new method--Peptide Element Alignment (PETAL) that uses raw spectrum data and detected peak to simultaneously align features from multiple LC-MS experiments. PETAL creates spectrum elements, each of which represents the mass spectrum of a single peptide in a single scan. Peptides detected in different LC-MS data are aligned if they can be represented by the same elements. By considering each peptide separately, PETAL enjoys greater flexibility than time warping methods. While most existing methods process multiple data sets by sequentially aligning each data set to an arbitrarily chosen template data set, PETAL treats all experiments symmetrically and can analyze all experiments simultaneously. We illustrate the performance of PETAL on example data sets.
View details for DOI 10.1093/biostatistics/kxl015
View details for Web of Science ID 000245512000015
View details for PubMedID 16880200
-
Reduced selection leads to accelerated gene loss in Shigella
GENOME BIOLOGY
2007; 8 (8)
Abstract
Obligate pathogenic bacteria lose more genes relative to facultative pathogens, which, in turn, lose more genes than free-living bacteria. It was suggested that the increased gene loss in obligate pathogens may be due to a reduction in the effectiveness of purifying selection. Less attention has been given to the causes of increased gene loss in facultative pathogens.We examined in detail the rate of gene loss in two groups of facultative pathogenic bacteria: pathogenic Escherichia coli, and Shigella. We show that Shigella strains are losing genes at an accelerated rate relative to pathogenic E. coli. We demonstrate that a genome-wide reduction in the effectiveness of selection contributes to the observed increase in the rate of gene loss in Shigella.When compared with their closely related pathogenic E. coli relatives, the more niche-limited Shigella strains appear to be losing genes at a significantly accelerated rate. A genome-wide reduction in the effectiveness of purifying selection plays a role in creating this observed difference. Our results demonstrate that differences in the effectiveness of selection contribute to differences in rate of gene loss in facultative pathogenic bacteria. We discuss how the lifestyle and pathogenicity of Shigella may alter the effectiveness of selection, thus influencing the rate of gene loss.
View details for DOI 10.1186/gb-2007-8-8-r164
View details for Web of Science ID 000253938500016
View details for PubMedID 17686180
View details for PubMedCentralID PMC2374995
-
Combining multiple family-based association studies.
BMC proceedings
2007; 1: S162-?
Abstract
While high-throughput genotyping technologies are becoming readily available, the merit of using these technologies to perform genome-wide association studies has not been established. One major concern is that for studies of complex diseases and traits, the whole-genome approach requires such large sample sizes that both recruitment and genotyping pose considerable challenge. Here we propose a novel statistical method that boosts the effective sample size by combining data obtained from several studies. Specifically, we consider a situation in which various studies have genotyped non-overlapping subjects at largely non-overlapping sets of markers. Our approach, which exploits the local linkage disequilibrium structure without assuming an explicit population model, opens up the possibility of improving statistical power by incorporating existing data into future association studies.
View details for PubMedID 18466508
-
Ancestry-environment interactions and asthma risk among Puerto Ricans
AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE
2006; 174 (10): 1086-1091
Abstract
Puerto Ricans, an admixed population of African, European, and Native American ancestries, have the highest asthma prevalence, morbidity, and mortality rates of any United States' population. Although socioeconomic status (SES) is negatively correlated with asthma incidence in most populations, no such relationship has been identified among Puerto Ricans. We hypothesized that, in this admixed population, the association between SES and asthma may interact with genetic ancestry.We analyzed 135 Puerto Rican subjects with asthma and 156 control subjects recruited from six different recruitment centers in Puerto Rico. Individual ancestry for each subject was estimated using 44 ancestry informative markers. SES was assigned using the census tracts' median family income. Analyses of SES were based on the SES of the clinic site from which the subjects were recruited and on a subset of individuals on whom home address-based SES was available.In the two (independent) analyses, we found a significant interaction between SES, ancestry, and asthma disease status. At lower SES, European ancestry was associated with increased risk of asthma, whereas African ancestry was associated with decreased risk. The opposite was true for their higher SES counterparts.The observed interaction may help to explain the unique pattern of risk for asthma in Puerto Ricans and the lack of association with SES observed in previous studies when not accounting for varying proportions of ancestry.
View details for DOI 10.1164/RCCM.200605-596OC
View details for Web of Science ID 000241940500005
View details for PubMedCentralID PMC2648109
-
A classical likelihood based approach for admixture mapping using EM algorithm
HUMAN GENETICS
2006; 120 (3): 431-445
Abstract
Several disease-mapping methods have been proposed recently, which use the information generated by recent admixture of populations from historically distinct geographic origins. These methods include both classic likelihood and Bayesian approaches. In this study we directly maximize the likelihood function from the hidden Markov Model for admixture mapping using the EM algorithm, allowing for uncertainty in model parameters, such as the allele frequencies in the parental populations. We determined the robustness of the proposed method by examining the ancestral allele frequency estimate and individual marker-location specific ancestry when the data were generated by different population admixture models and no learning sample was used. The proposed method outperforms a widely used Bayesian MCMC strategy for data generated from various population admixture models. The multipoint information content for ancestry was derived based on the map provided by Smith et al. (2004) and the associated statistical power was calculated. We examined the distribution of admixture LD across the genome for both real and simulated data and established a threshold for genome wide significance applicable to admixture mapping studies. The software ADMIXPROGRAM for performing admixture mapping is available from authors.
View details for DOI 10.1007/s00439-006-0224-z
View details for Web of Science ID 000240613900011
View details for PubMedID 16896924
-
Genomewide evolutionary rates in laboratory and wild yeast
GENETICS
2006; 174 (1): 541-544
Abstract
As wild organisms adapt to the laboratory environment, they become less relevant as biological models. It has been suggested that a commonly used S. cerevisiae strain has rapidly accumulated mutations in the lab. We report a low-to-intermediate rate of protein evolution in this strain relative to wild isolates.
View details for DOI 10.1534/genetics.106.060863
View details for Web of Science ID 000241134400048
View details for PubMedID 16816417
-
Population stratification confounds genetic association studies among Latinos
HUMAN GENETICS
2006; 118 (5): 652-664
Abstract
In the United States, asthma prevalence and mortality are the highest among Puerto Ricans and the lowest among Mexicans. Case-control association studies are a powerful strategy for identifying genes of modest effect in complex diseases. However, studies of complex disorders in admixed populations such as Latinos may be confounded by population stratification. We used ancestry informative markers (AIMs) to identify and correct for population stratification among Mexican and Puerto Rican subjects participating in case-control studies of asthma. Three hundred and sixty-two subjects with asthma (Mexican: 181, Puerto Rican: 181) and 359 ethnically matched controls (Mexican: 181, Puerto Rican: 178) were genotyped for 44 AIMs. We observed a greater than expected degree of association between pairs of AIMs on different chromosomes in Mexicans (P < 0.00001) and Puerto Ricans (P < 0.00002) providing evidence for population substructure and/or recent admixture. To assess the effect of population stratification on association studies of asthma, we measured differences in genetic background of cases and controls by comparing allele frequencies of the 44 AIMs. Among Puerto Ricans but not in Mexicans, we observed a significant overall difference in allele frequencies between cases and controls (P = 0.0002); of 44 AIMs tested, 8 (18%) were significantly associated with asthma. However, after adjustment for individual ancestry, only two of these markers remained significantly associated with the disease. Our findings suggest that empirical assessment of the effects of stratification is critical to appropriately interpret the results of case-control studies in admixed populations.
View details for DOI 10.1007/s00439-005-0071-3
View details for Web of Science ID 000235454100011
View details for PubMedID 16283388
-
Locally weighted transmission/disequilibrium test for genetic association analysis
14th Genetic Analysis Workshop
BIOMED CENTRAL LTD. 2005
Abstract
The transmission/disequilibrium test statistic has been used for assessing genetic association in affected-parent trios. In the presence of multiple tightly linked marker loci where local dependency may exist, haplotypes are reconstructed statistically to estimate the joint effects of these markers. In this manuscript, we propose an alternative to the haplotype approach by taking a weighted average of multiple loci, where the weight is proportional to the product of (1-2X recombination fraction) and the linkage disequilibrium between markers. As an illustration, we applied the method to the simulated Aipotu data.
View details for Web of Science ID 000236103400060
View details for PubMedID 16451673
View details for PubMedCentralID PMC1866722
-
A newly discovered founder population: the Roma/Gypsies
BIOESSAYS
2005; 27 (10): 1084-1094
Abstract
The Gypsies (a misnomer, derived from an early legend about Egyptian origins) defy the conventional definition of a population: they have no nation-state, speak different languages, belong to many religions and comprise a mosaic of socially and culturally divergent groups separated by strict rules of endogamy. Referred to as "the invisible minority", the Gypsies have for centuries been ignored by Western medicine, and their genetic heritage has only recently attracted attention. Common origins from a small group of ancestors characterise the 8-10 million European Gypsies as an unusual trans-national founder population, whose exodus from India played the role of a profound demographic bottleneck. Social and economic pressures within Europe led to gradual fragmentation, generating multiple genetically differentiated subisolates. The string of population bottlenecks and founder effects have shaped a unique genetic profile, whose potential for genetic research can be met only by study designs that acknowledge cultural tradition and self-identity.
View details for DOI 10.1002/bies.20287
View details for Web of Science ID 000232361100012
View details for PubMedID 16163730
-
Genetic admixture and asthma-related phenotypes in Mexican American and Puerto Rican asthmatics
GENETIC EPIDEMIOLOGY
2005; 29 (1): 76-86
Abstract
Genetic association studies in admixed populations may be biased if individual ancestry varies within the population and the phenotype of interest is associated with ancestry. However, recently admixed populations also offer potential benefits in association studies since markers informative for ancestry may be in linkage disequilibrium across large distances. In particular, the enhanced LD in admixed populations may be used to identify alleles that underlie a genetically determined difference in a phenotype between two ancestral populations. Asthma is known to have different prevalence and severity among ancestrally distinct populations. We investigated several asthma-related phenotypes in two ancestrally admixed populations: Mexican Americans and Puerto Ricans. We used ancestry informative markers to estimate the individual ancestry of 181 Mexican American asthmatics and 181 Puerto Rican asthmatics and tested whether individual ancestry is associated with any of these phenotypes independently of known environmental factors. We found an association between higher European ancestry and more severe asthma as measured by both forced expiratory volume at 1 second (r=-0.21, p=0.005) and by a clinical assessment of severity among Mexican Americans (OR: 1.55; 95% CI 1.25 to 1.93). We found no significant associations between ancestry and severity or drug responsiveness among Puerto Ricans. These results suggest that asthma severity may be influenced by genetic factors differentiating Europeans and Native Americans in Mexican Americans, although differing results for Puerto Ricans require further investigation.
View details for DOI 10.1002/gepi.20079
View details for Web of Science ID 000230016100007
View details for PubMedID 15918156
-
Estimation of individual admixture: Analytical and study design considerations
GENETIC EPIDEMIOLOGY
2005; 28 (4): 289-301
Abstract
The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African-Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency, measured by the mean squared error criterion, as Bayesian methods but requires just a fraction of the computational time to produce point estimates, allowing for extensive analysis (e.g., simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results.
View details for DOI 10.1002/gepi.20064
View details for Web of Science ID 000228573700001
View details for PubMedID 15712363
-
Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies
AMERICAN JOURNAL OF HUMAN GENETICS
2005; 76 (2): 268-275
Abstract
We have analyzed genetic data for 326 microsatellite markers that were typed uniformly in a large multiethnic population-based sample of individuals as part of a study of the genetics of hypertension (Family Blood Pressure Program). Subjects identified themselves as belonging to one of four major racial/ethnic groups (white, African American, East Asian, and Hispanic) and were recruited from 15 different geographic locales within the United States and Taiwan. Genetic cluster analysis of the microsatellite markers produced four major clusters, which showed near-perfect correspondence with the four self-reported race/ethnicity categories. Of 3,636 subjects of varying race/ethnicity, only 5 (0.14%) showed genetic cluster membership different from their self-identified race/ethnicity. On the other hand, we detected only modest genetic differentiation between different current geographic locales within each race/ethnicity group. Thus, ancient geographic ancestry, which is highly correlated with self-identified race/ethnicity--as opposed to current residence--is the major determinant of genetic structure in the U.S. population. Implications of this genetic structure for case-control association studies are discussed.
View details for Web of Science ID 000226215100012
View details for PubMedID 15625622
-
Ethnicity and human genetic linkage maps
AMERICAN JOURNAL OF HUMAN GENETICS
2005; 76 (2): 276-290
Abstract
Human genetic linkage maps are based on rates of recombination across the genome. These rates in humans vary by the sex of the parent from whom alleles are inherited, by chromosomal position, and by genomic features, such as GC content and repeat density. We have examined--for the first time, to our knowledge--racial/ethnic differences in genetic maps of humans. We constructed genetic maps based on 353 microsatellite markers in four racial/ethnic groups: whites, African Americans, Mexican Americans, and East Asians (Chinese and Japanese). These maps were generated using 9,291 subjects from 2,900 nuclear families who participated in the National Heart, Lung, and Blood Institute-funded Family Blood Pressure Program, the largest sample used for map construction to date. Although the maps for the different groups are generally similar, we did find regional and genomewide differences across ethnic groups, including a longer genomewide map for African Americans than for other populations. Some of this variation was explained by genotyping artifacts--namely, null alleles (i.e., alleles with null phenotypes) at a number of loci--and by ethnic differences in null-allele frequencies. In particular, null alleles appear to be the likely explanation for the excess map length in African Americans. We also found that nonrandom missing data biases map results. However, we found regions on chromosome 8p and telomeric segments with significant ethnic differences and a suggestive interval on chromosome 12q that were not due to genotype artifacts. The difference on chromosome 8p is likely due to a polymorphic inversion in the region. The results of our investigation have implications for inferences of possible genetic influences on human recombination as well as for future linkage studies, especially those involving populations of nonwhite ethnicity.
View details for Web of Science ID 000226215100013
View details for PubMedID 15627237
-
Admixture mapping for hypertension loci with genome-scan markers
NATURE GENETICS
2005; 37 (2): 177-181
Abstract
Identification of genetic variants that contribute to risk of hypertension is challenging. As a complement to linkage and candidate gene association studies, we carried out admixture mapping using genome-scan microsatellite markers among the African American participants in the US National Heart, Lung, and Blood Institute's Family Blood Pressure Program. This population was assumed to have experienced recent admixture from ancestral groups originating in Africa and Europe. We used a set of unrelated individuals from Nigeria to represent the African ancestral population and used the European Americans in the Family Blood Pressure Program to provide estimates of allele frequencies for the European ancestors. We genotyped a common set of 269 microsatellite markers in the three groups at the same laboratory. The distribution of marker location-specific African ancestry, based on multipoint analysis, was shifted upward in hypertensive cases versus normotensive controls, consistent with linkage to genes conferring susceptibility. This shift was largely due to a small number of loci, including five adjacent markers on chromosome 6q and two on chromosome 21q. These results suggest that chromosome 6q24 and 21q21 may contain genes influencing risk of hypertension in African Americans.
View details for DOI 10.1038/ng1510
View details for Web of Science ID 000226690100025
View details for PubMedID 15665825
-
Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome
NATURE GENETICS
2003; 35 (2): 185-189
Abstract
Congenital cataracts facial dysmorphism neuropathy (CCFDN) syndrome (OMIM 604168) is an autosomal recessive developmental disorder that occurs in an endogamous group of Vlax Roma (Gypsies; refs. 1-3). We previously localized the gene associated with CCFDN to 18qter, where a conserved haplotype suggested a single founder mutation. In this study, we used recombination mapping to refine the gene position to a 155-kb critical interval. During haplotype analysis, we found that the non-transmitted chromosomes of some unaffected parents carried the conserved haplotype associated with the disease. Assuming such parents to be completely homozygous across the critical interval except with respect to the disease-causing mutation, we developed a new 'not quite identical by descent' (NQIBD) approach, which allowed us to identify the mutation causing the disease by sequencing DNA from a single unaffected homozygous parent. We show that CCFDN is caused by a single-nucleotide substitution in an antisense Alu element in intron 6 of CTDP1 (encoding the protein phosphatase FCP1, an essential component of the eukaryotic transcription machinery), resulting in a rare mechanism of aberrant splicing and an Alu insertion in the processed mRNA. CCFDN thus joins the group of 'transcription syndromes' and is the first 'purely' transcriptional defect identified that affects polymerase II-mediated gene expression.
View details for DOI 10.1038/ng1243
View details for Web of Science ID 000185625300017
View details for PubMedID 14517542
-
Geographic distribution of disease mutations in the Ashkenazi Jewish population supports genetic drift over selection
AMERICAN JOURNAL OF HUMAN GENETICS
2003; 72 (4): 812-822
Abstract
The presence of four lysosomal storage diseases (LSDs) at increased frequency in the Ashkenazi Jewish population has suggested to many the operation of natural selection (carrier advantage) as the driving force. We compare LSDs and nonlysosomal storage diseases (NLSDs) in terms of the number of mutations, allele-frequency distributions, and estimated coalescence dates of mutations. We also provide new data on the European geographic distribution, in the Ashkenazi population, of seven LSD and seven NLSD mutations. No differences in any of the distributions were observed between LSDs and NLSDs. Furthermore, no regular pattern of geographic distribution was observed for LSD versus NLSD mutations-with some being more common in central Europe and others being more common in eastern Europe, within each group. The most striking disparate pattern was the geographic distribution of the two primary Tay-Sachs disease mutations, with the first being more common in central Europe (and likely older) and the second being exclusive to eastern Europe (primarily Lithuania and Russia) (and likely much younger). The latter demonstrates a pattern similar to two other recently arisen Lithuanian mutations, those for torsion dystonia and familial hypercholesterolemia. These observations provide compelling support for random genetic drift (chance founder effects, one approximately 11 centuries ago that affected all Ashkenazim and another approximately 5 centuries ago that affected Lithuanians), rather than selection, as the primary determinant of disease mutations in the Ashkenazi population.
View details for Web of Science ID 000181972600004
View details for PubMedID 12612865
-
Categorization of humans in biomedical research: genes, race and disease.
Genome biology
2002; 3 (7): comment2007-?
Abstract
A debate has arisen regarding the validity of racial/ethnic categories for biomedical and genetic research. An epidemiologic perspective on the issue of human categorization in biomedical and genetic research strongly supports the continued use of self-identified race and ethnicity.
View details for PubMedID 12184798
-
Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition
GENETICS
2002; 161 (1): 447-459
Abstract
This article proposes a method of estimating the time to the most recent common ancestor (TMRCA) of a sample of DNA sequences. The method is based on the molecular clock hypothesis, but avoids assumptions about population structure. Simulations show that in a wide range of situations, the point estimate has small bias and the confidence interval has at least the nominal coverage probability. We discuss conditions that can lead to biased estimates. Performance of this estimator is compared with existing methods based on the coalescence theory. The method is applied to sequences of Y chromosomes and mtDNAs to estimate the coalescent times of human male and female populations.
View details for Web of Science ID 000175814900040
View details for PubMedID 12019257
-
Geographic distribution of three Tay-Sachs mutations in Ashkenazim supports drift over selection.
CELL PRESS. 2001: 180–80
View details for Web of Science ID 000171648900027
-
Locating regions of differential variability in DNA and protein sequences
GENETICS
1999; 153 (1): 485-495
Abstract
In the comparison of DNA and protein sequences between species or between paralogues or among individuals within a species or population, there is often some indication that different regions of the sequence are divergent or polymorphic to different degrees, indicating differential constraint or diversifying selection operating in different regions of the sequence. The problem is to test statistically whether the observed regional differences in the density of variant sites represent real differences and then to estimate as accurately as possible the location of the differential regions. A method is given for testing and locating regions of differential variation. The method consists of calculating G(x(k)) = k/n - x(k)/N, where x(k) is the position of the kth variant site along the sequence, n is the total number of variant sites, and N is the total sequence length. The estimated region is the longest stretch of adjacent sequence for which G(x(k)) is monotonically increasing (a hot spot) or decreasing (a cold spot). Critical values of this length for tests of significance are given, a sequential method is developed for locating multiple differential regions, and the power of the method against various alternatives is explored. The method locates the endpoints of hot spots and cold spots of variation with high accuracy.
View details for Web of Science ID 000082421600035
View details for PubMedID 10471728