Doctor of Philosophy, Johns Hopkins University (2013)
Master of Science, Johns Hopkins University (2010)
Bachelor of Arts, Cornell University (2008)
Carlos Bustamante, Postdoctoral Research Mentor
Identifying tagging SNPs for African specific genetic variation from the African Diaspora Genome
A primary goal of The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) is to develop an 'African Diaspora Power Chip' (ADPC), a genotyping array consisting of tagging SNPs, useful in comprehensively identifying African specific genetic variation. This array is designed based on the novel variation identified in 642 CAAPA samples of African ancestry with high coverage whole genome sequence data (~30× depth). This novel variation extends the pattern of variation catalogued in the 1000 Genomes and Exome Sequencing Projects to a spectrum of populations representing the wide range of West African genomic diversity. These individuals from CAAPA also comprise a large swath of the African Diaspora population and incorporate historical genetic diversity covering nearly the entire Atlantic coast of the Americas. Here we show the results of designing and producing such a microchip array. This novel array covers African specific variation far better than other commercially available arrays, and will enable better GWAS analyses for researchers with individuals of African descent in their study populations. A recent study cataloging variation in continental African populations suggests this type of African-specific genotyping array is both necessary and valuable for facilitating large-scale GWAS in populations of African ancestry.
View details for DOI 10.1038/srep46398
View details for Web of Science ID 000399985900001
View details for PubMedID 28429804
Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations
AMERICAN JOURNAL OF HUMAN GENETICS
2017; 100 (4): 635-649
The vast majority of genome-wide association studies (GWASs) are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g., linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWASs, we used published summary statistics to calculate polygenic risk scores for eight well-studied phenotypes. We identify directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk are typically highest in the population from which summary statistics were derived. We demonstrate that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants and that biases in any direction are possible and unpredictable. This work cautions that summarizing findings from large-scale GWASs may have limited portability to other populations using standard approaches and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.
View details for DOI 10.1016/j.ajhg.2017.03.004
View details for Web of Science ID 000398389600006
View details for PubMedID 28366442
Strategies for Enriching Variant Coverage in Candidate Disease Loci on a Multiethnic Genotyping Array
2016; 11 (12)
Investigating genetic architecture of complex traits in ancestrally diverse populations is imperative to understand the etiology of disease. However, the current paucity of genetic research in people of African and Latin American ancestry, Hispanic and indigenous peoples in the United States is likely to exacerbate existing health disparities for many common diseases. The Population Architecture using Genomics and Epidemiology, Phase II (PAGE II), Study was initiated in 2013 by the National Human Genome Research Institute to expand our understanding of complex trait loci in ethnically diverse and well characterized study populations. To meet this goal, the Multi-Ethnic Genotyping Array (MEGA) was designed to substantially improve fine-mapping and functional discovery by increasing variant coverage across multiple ethnicities at known loci for metabolic, cardiovascular, renal, inflammatory, anthropometric, and a variety of lifestyle traits. Studying the frequency distribution of clinically relevant mutations, putative risk alleles, and known functional variants across multiple populations will provide important insight into the genetic architecture of complex diseases and facilitate the discovery of novel, sometimes population-specific, disease associations. DNA samples from 51,650 self-identified African ancestry (17,328), Hispanic/Latino (22,379), Asian/Pacific Islander (8,640), and American Indian (653) and an additional 2,650 participants of either South Asian or European ancestry, and other reference panels have been genotyped on MEGA by PAGE II. MEGA was designed as a new resource for studying ancestrally diverse populations. Here, we describe the methodology for selecting trait-specific content for use in multi-ethnic populations and how enriching MEGA for this content may contribute to deeper biological understanding of the genetic etiology of complex disease.
View details for DOI 10.1371/journal.pone.0167758
View details for Web of Science ID 000392754300044
View details for PubMedID 27973554
View details for PubMedCentralID PMC5156387
- Relative performance of gene- and pathway-level methods as secondary analyses for genome-wide association studies BMC GENETICS 2015; 16
Genome-wide association study of hepatitis C virus- and cryoglobulin-related vasculitis
GENES AND IMMUNITY
2014; 15 (7): 500-505
The host genetic basis of mixed cryoglobulin vasculitis is not well understood and has not been studied in large cohorts. A genome-wide association study was conducted among 356 hepatitis C virus (HCV) RNA-positive individuals with cryoglobulin-related vasculitis and 447 ethnically matched, HCV RNA-positive controls. All cases had both serum cryoglobulins and a vasculitis syndrome. A total of 899 641 markers from the Illumina HumanOmni1-Quad chip were analyzed using logistic regression adjusted for sex, as well as genetically determined ancestry. Replication of select single-nucleotide polymorphisms (SNPs) was conducted using 91 cases and 180 controls, adjusting for sex and country of origin. The most significant associations were identified on chromosome 6 near the NOTCH4 and MHC class II genes. A genome-wide significant association was detected on chromosome 6 at SNP rs9461776 (odds ratio=2.16, P=1.16E-07) between HLA-DRB1 and DQA1: this association was further replicated in additional independent samples (meta-analysis P=7.1 × 10(-9)). A genome-wide significant association with cryoglobulin-related vasculitis was identified with SNPs near NOTCH4 and MHC Class II genes. The two regions are correlated and it is difficult to disentangle which gene is responsible for the association with mixed cryoglobulinemia vasculitis in this extended major histocompatibility complex region.
View details for DOI 10.1038/gene.2014.41
View details for Web of Science ID 000343960500009
View details for PubMedID 25030430
Admixture analysis of spontaneous hepatitis C virus clearance in individuals of African descent.
Genes and immunity
2014; 15 (4): 241-246
Hepatitis C virus (HCV) infects an estimated 3% of the global population with the majority of individuals (75-85%) failing to clear the virus without treatment, leading to chronic liver disease. Individuals of African descent have lower rates of clearance compared with individuals of European descent and this is not fully explained by social and environmental factors. This suggests that differences in genetic background may contribute to this difference in clinical outcome following HCV infection. Using 473 individuals and 792,721 single-nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS), we estimated local African ancestry across the genome. Using admixture mapping and logistic regression, we identified two regions of interest associated with spontaneous clearance of HCV (15q24, 20p12). A genome-wide significant variant was identified on chromosome 15 at the imputed SNP, rs55817928 (P=6.18 × 10(-8)) between the genes SCAPER and RCN. Each additional copy of the African ancestral C allele is associated with 2.4 times the odds of spontaneous clearance. Conditional analysis using this SNP in the logistic regression model explained one-third of the local ancestry association. Additionally, signals of selection in this area suggest positive selection due to some ancestral pathogen or environmental pressure in African, but not in European populations.
View details for DOI 10.1038/gene.2014.11
View details for PubMedID 24622687
Variants in HAVCR1 Gene Region Contribute to Hepatitis C Persistence in African Americans
JOURNAL OF INFECTIOUS DISEASES
2014; 209 (3): 355-359
To confirm previously identified polymorphisms in HAVCR1 that were associated with persistent hepatitis C virus (HCV) infection in individuals of African and of European descent, we studied 165 subjects of African descent and 635 subjects of European descent. Because the association was only confirmed in subjects of African descent (rs6880859; odds ratio, 2.42; P = .01), we then used 379 subjects of African descent (142 with spontaneous HCV clearance) to fine-map HAVCR1. rs111511318 was strongly associated with HCV persistence after adjusting for IL28B and HLA (adjusted P = 8.8 × 10(-4)), as was one 81-kb haplotype (adjusted P = .0006). The HAVCR1 genomic region is an independent genetic determinant of HCV persistence in individuals of African descent.
View details for DOI 10.1093/infdis/jit444
View details for Web of Science ID 000329921700009
View details for PubMedID 23964107
Genome-Wide Association Study of Spontaneous Resolution of Hepatitis C Virus Infection: Data From Multiple Cohorts
ANNALS OF INTERNAL MEDICINE
2013; 158 (4): 235-?
Chinese translationHepatitis C virus (HCV) infections occur worldwide and either spontaneously resolve or persist and markedly increase the person's lifetime risk for cirrhosis and hepatocellular carcinoma. Although HCV persistence occurs more often in persons of African ancestry and persons with genetic variants near interleukin-28B (IL-28B), the genetic basis is not well-understood.To evaluate the host genetic basis for spontaneous resolution of HCV infection.2-stage, genome-wide association study.13 international multicenter study sites.919 persons with serum HCV antibodies but no HCV RNA (spontaneous resolution) and 1482 persons with serum HCV antibodies and HCV RNA (persistence).Frequencies of 792 721 single nucleotide polymorphisms (SNPs).Differences in allele frequencies between persons with spontaneous resolution and persistence were identified on chromosomes 19q13.13 and 6p21.32. On chromosome 19, allele frequency differences localized near IL-28B and included rs12979860 (overall per-allele OR, 0.45; P = 2.17 × 10-30) and 10 additional SNPs spanning 55 000 base pairs. On chromosome 6, allele frequency differences localized near genes for HLA class II and included rs4273729 (overall per-allele OR, 0.59; P = 1.71 × 10-16) near DQB1*03:01 and an additional 116 SNPs spanning 1 090 000 base pairs. The associations in chromosomes 19 and 6 were independent and additive and explain an estimated 14.9% (95% CI, 8.5% to 22.6%) and 15.8% (CI, 4.4% to 31.0%) of the variation in HCV resolution in persons of European and African ancestry, respectively. Replication of the chromosome 6 SNP, rs4272729, in an additional 745 persons confirmed the findings (P = 0.015).Epigenetic effects were not studied.IL-28B and HLA class II are independently associated with spontaneous resolution of HCV infection, and SNPs marking IL-28B and DQB1*03:01 may explain approximately 15% of spontaneous resolution of HCV infection.
View details for Web of Science ID 000315580300014
View details for PubMedID 23420232
Polymorphisms in Toll-like receptor genes influence antibody responses to cytomegalovirus glycoprotein B vaccine.
BMC research notes
2012; 5: 140-?
Congenital Cytomegalovirus (CMV) infection is an important medical problem that has yet no current solution. A clinical trial of CMV glycoprotein B (gB) vaccine in young women showed promising efficacy. Improved understanding of the basis for prevention of CMV infection is essential for developing improved vaccines.We genotyped 142 women previously vaccinated with three doses of CMV gB for single nucleotide polymorphisms (SNPs) in TLR 1-4, 6, 7, 9, and 10, and their associated intracellular signaling genes. SNPs in the platelet-derived growth factor receptor (PDGFRA) and integrins were also selected based on their role in binding gB. Specific SNPs in TLR7 and IKBKE (inhibitor of nuclear factor kappa-B kinase subunit epsilon) were associated with antibody responses to gB vaccine. Homozygous carriers of the minor allele at four SNPs in TLR7 showed higher vaccination-induced antibody responses to gB compared to heterozygotes or homozygotes for the common allele. SNP rs1953090 in IKBKE was associated with changes in antibody level from second to third dose of vaccine; homozygotes for the minor allele exhibited lower antibody responses while homozygotes for the major allele showed increased responses over time.These data contribute to our understanding of the immunogenetic mechanisms underlying variations in the immune response to CMV vaccine.
View details for DOI 10.1186/1756-0500-5-140
View details for PubMedID 22414065
Identification of functional genetic variation in exome sequence analysis.
2011; 5: S13-?
Recent technological advances have allowed us to study individual genomes at a base-pair resolution and have demonstrated that the average exome harbors more than 15,000 genetic variants. However, our ability to understand the biological significance of the identified variants and to connect these observed variants with phenotypes is limited. The first step in this process is to identify genetic variation that is likely to result in changes to protein structure and function, because detailed studies, either population based or functional, for each of the identified variants are not practicable. Therefore algorithms that yield valid predictions of a variant's functional significance are needed. Over the past decade, several programs have been developed to predict the probability that an observed sequence variant will have a deleterious effect on protein function. These algorithms range from empirical programs that classify using known biochemical properties to statistical algorithms trained using a variety of data sources, including sequence conservation data, biochemical properties, and functional data. Using data from the pilot3 study of the 1000 Genomes Project available through Genetic Analysis Workshop 17, we compared the results of four programs (SIFT, PolyPhen, MAPP, and VarioWatch) used to predict the functional relevance of variants in 101 genes. Analysis was conducted without knowledge of the simulation model. Agreement between programs was modest ranging from 59.4% to 71.4% and only 3.5% of variants were classified as deleterious and 10.9% as tolerated across all four programs.
View details for DOI 10.1186/1753-6561-5-S9-S13
View details for PubMedID 22373437