Alexander Ioannidis

Assistant Professor (Research) of Genetics and of Biomedical Data Science
Adjunct Professor, Institute for Computational and Mathematical Engineering (ICME)

Bio

Dr. Ioannidis earned his Ph.D. from Stanford University in Computational and Mathematical Engineering together with an M.S. in Management Science and Engineering (Optimization). He graduated summa cum laude from Harvard University in Chemistry and Physics and earned an M.Phil at the University of Cambridge from the Department of Applied Math and Theoretical Physics in Computational Biology. His research focuses on the design of algorithms and application of computational methods for problems in precision health, genomics, clinical data science, and AI in healthcare.

Academic Appointments

Assistant Professor-Research, Genetics
Assistant Professor-Research, Department of Biomedical Data Science
Adjunct Professor, Institute for Computational and Mathematical Engineering (ICME)

Professional Education

Ph.D., Stanford, Computational and Mathematical Engineering
M.S., Stanford, Management Science and Engineering
M.Phil, University of Cambridge, Computational Biology
B.A., summa cum laude, Harvard, Chemistry and Physics

All Publications

Clinical genetic variation across Hispanic populations in the Mexican Biobank. Nature medicine Barberena-Jonas, C., Medina-Munoz, S. G., Cedillo-Castelan, V., Sepulveda-Morales, T., Gonzaga-Jauregui, C., ENSA Genomics Consortium, Garcia-Garcia, L., Ioannidis, A. G., Moreno-Estrada, A., Aguilar-Salinas, C., Barberena-Jonas, C., Canizales-Quintero, S., Cruz-Hervert, L. P., Delgado-Sanchez, G., Ferreira-Guerrero, E., Ferreyra-Reyes, L., Gutierrez-Lopez, C., Hernandez-Avila, J. E., Huerta-Chagoya, A., Juarez-Figueroa, L., Kuri-Morales, P., Lazcano-Ponce, E., Magis-Rodriguez, C., Mongua-Rodriguez, N., Moreno-Macias, H., Ortega-Estrada, M. d., Palma-Martinez, M. J., Quinto-Cortes, C. D., Rodriguez-Guillen, R., Ordonez-Sanchez, M. L., Sarti-Gutierrez, E., Sandoval, K., Sepulveda-Amor, J., Sohail, M., Tapia-Conyer, R., Tusie-Luna, M. T., Valdespino-Gomez, J. L., Tellez-Vazquez, N., Velazquez-Monroy, O., Velazquez-Meza, M. 2026

Abstract

Genetic testing for specific alleles is often recommended based on an individual's ancestry. However, the frequency of pathogenic and pharmacogenomic alleles across different Hispanic groups has not been well characterized, and existing guidelines often fail to recognize the geographic and ancestral diversity within these populations. Here analyzing data from 6,011 individuals from the nationwide Mexican Biobank, we show that Mexican individuals have striking regional differences in biomedically relevant allele frequencies, shaped both by their overall admixture proportions, but also by the local Indigenous ancestral groups contributing to their genome (for example, Nahua in central Mexico, Zapotec in the South or Maya in the Yucatan peninsula). We found ancestry-specific patterns with clinical implications that could not have been detected without a local ancestry-informed approach, including variants affecting fentanyl (rs2242480) and statin (rs4149056) metabolism, examples particularly relevant to the epidemiology of Hispanic populations. This analysis framework could inform genetic testing guidelines across the Americas. We are making available the results for 42,769 biomedically relevant genotyped variants through MexVar, a user-friendly platform designed to improve access to genomic data for the scientific community and support genetic analyses for populations of Mexican descent worldwide.

View details for DOI 10.1038/s41591-025-04100-z

View details for PubMedID 41566040
Autoencoders for genomic variation analysis. Genome research Geleta, M., Montserrat, D. M., Giro-I-Nieto, X., Ioannidis, A. G. 2026

Abstract

Modern biobanks are providing numerous high-resolution genomic sequences of diverse populations. In order to account for diverse and admixed populations, new algorithmic tools are needed in order to properly capture the genetic composition of populations. Here, we explore deep learning techniques, namely, variational autoencoders (VAEs), to process genomic data from a population perspective. We show the power of VAEs for a variety of tasks relating to the interpretation, compression, classification, and simulation of genomic data with several worldwide whole genome data sets from both humans and canids, and evaluate the performance of the proposed applications with and without ancestry conditioning. The unsupervised setting of autoencoders allows for the detection and learning of granular population structure and inferring of informative latent factors. The learned latent spaces of VAEs are able to capture and represent differentiated Gaussian-like clusters of samples with similar genetic composition on a fine scale from single nucleotide polymorphisms (SNPs), enabling applications in dimensionality reduction and data simulation. These individual genotype sequences can then be decomposed into latent representations and reconstruction errors (residuals), which provide a sparse representation useful for lossless compression. We show that different populations have differentiated compression ratios and classification accuracies. Additionally, we analyze the entropy of the SNP data, its effect on compression across populations, and its relation to historical migrations, and we show how to introduce autoencoders into existing compression pipelines.

View details for DOI 10.1101/gr.280086.124

View details for PubMedID 41558827
Neural ADMIXTURE for rapid genomic clustering. Nature computational science Mantes, A. D., Montserrat, D. M., Bustamante, C. D., Giró-I-Nieto, X., Ioannidis, A. G. 2023; 3 (7): 621-629

Abstract

Characterizing the genetic structure of large cohorts has become increasingly important as genetic studies extend to massive, increasingly diverse biobanks. Popular methods decompose individual genomes into fractional cluster assignments with each cluster representing a vector of DNA variant frequencies. However, with rapidly increasing biobank sizes, these methods have become computationally intractable. Here we present Neural ADMIXTURE, a neural network autoencoder that follows the same modeling assumptions as the current standard algorithm, ADMIXTURE, while reducing the compute time by orders of magnitude surpassing even the fastest alternatives. One month of continuous compute using ADMIXTURE can be reduced to just hours with Neural ADMIXTURE. A multi-head approach allows Neural ADMIXTURE to offer even further acceleration by calculating multiple cluster numbers in a single run. Furthermore, the models can be stored, allowing cluster assignment to be performed on new data in linear time without needing to share the training samples.

View details for DOI 10.1038/s43588-023-00482-7

View details for PubMedID 37600116

View details for PubMedCentralID PMC10438426
Deconvoluting complex correlates of COVID-19 severity with a multi-omic pandemic tracking strategy. Nature communications Parikh, V. N., Ioannidis, A. G., Jimenez-Morales, D., Gorzynski, J. E., De Jong, H. N., Liu, X., Roque, J., Cepeda-Espinoza, V. P., Osoegawa, K., Hughes, C., Sutton, S. C., Youlton, N., Joshi, R., Amar, D., Tanigawa, Y., Russo, D., Wong, J., Lauzon, J. T., Edelson, J., Mas Montserrat, D., Kwon, Y., Rubinacci, S., Delaneau, O., Cappello, L., Kim, J., Shoura, M. J., Raja, A. N., Watson, N., Hammond, N., Spiteri, E., Mallempati, K. C., Montero-Martín, G., Christle, J., Kim, J., Kirillova, A., Seo, K., Huang, Y., Zhao, C., Moreno-Grau, S., Hershman, S. G., Dalton, K. P., Zhen, J., Kamm, J., Bhatt, K. D., Isakova, A., Morri, M., Ranganath, T., Blish, C. A., Rogers, A. J., Nadeau, K., Yang, S., Blomkalns, A., O'Hara, R., Neff, N. F., DeBoever, C., Szalma, S., Wheeler, M. T., Gates, C. M., Farh, K., Schroth, G. P., Febbo, P., deSouza, F., Cornejo, O. E., Fernandez-Vina, M., Kistler, A., Palacios, J. A., Pinsky, B. A., Bustamante, C. D., Rivas, M. A., Ashley, E. A. 2022; 13 (1): 5107

Abstract

The SARS-CoV-2 pandemic has differentially impacted populations across race and ethnicity. A multi-omic approach represents a powerful tool to examine risk across multi-ancestry genomes. We leverage a pandemic tracking strategy in which we sequence viral and host genomes and transcriptomes from nasopharyngeal swabs of 1049 individuals (736 SARS-CoV-2 positive and 313 SARS-CoV-2 negative) and integrate them with digital phenotypes from electronic health records from a diverse catchment area in Northern California. Genome-wide association disaggregated by admixture mapping reveals novel COVID-19-severity-associated regions containing previously reported markers of neurologic, pulmonary and viral disease susceptibility. Phylodynamic tracking of consensus viral genomes reveals no association with disease severity or inferred ancestry. Summary data from multiomic investigation reveals metagenomic and HLA associations with severe COVID-19. The wealth of data available from residual nasopharyngeal swabs in combination with clinical data abstracted automatically at scale highlights a powerful strategy for pandemic tracking, and reveals distinct epidemiologic, genetic, and biological associations for those at the highest risk.

View details for DOI 10.1038/s41467-022-32397-8

View details for PubMedID 36042219
Archetypal Analysis for population genetics. PLoS computational biology Gimbernat-Mayol, J., Dominguez Mantes, A., Bustamante, C. D., Mas Montserrat, D., Ioannidis, A. G. 2022; 18 (8): e1010301

Abstract

The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods are computationally-intensive, prohibitively so in the case of nationwide biobanks. Here we explore Archetypal Analysis as an efficient, unsupervised approach for identifying genetic clusters and for associating individuals with them. Such unsupervised approaches help avoid conflating socially constructed ethnic labels with genetic clusters by eliminating the need for exogenous training labels. We show that Archetypal Analysis yields similar cluster structure to existing unsupervised methods such as ADMIXTURE and provides interpretative advantages. More importantly, we show that since Archetypal Analysis can be used with lower-dimensional representations of genetic data, significant reductions in computational time and memory requirements are possible. When Archetypal Analysis is run in such a fashion, it takes several orders of magnitude less compute time than the current standard, ADMIXTURE. Finally, we demonstrate uses ranging across datasets from humans to canids.

View details for DOI 10.1371/journal.pcbi.1010301

View details for PubMedID 36007005
Paths and timings of the peopling of Polynesia inferred from genomic networks. Nature Ioannidis, A. G., Blanco-Portillo, J., Sandoval, K., Hagelberg, E., Barberena-Jonas, C., Hill, A. V., Rodriguez-Rodriguez, J. E., Fox, K., Robson, K., Haoa-Cardinali, S., Quinto-Cortes, C. D., Miquel-Poblete, J. F., Auckland, K., Parks, T., Sofro, A. S., Avila-Arcos, M. C., Sockell, A., Homburger, J. R., Eng, C., Huntsman, S., Burchard, E. G., Gignoux, C. R., Verdugo, R. A., Moraga, M., Bustamante, C. D., Mentzer, A. J., Moreno-Estrada, A. 2021; 597 (7877): 522-526

Abstract

Polynesia was settled in a series of extraordinary voyages across an ocean spanning one third of the Earth1, but the sequences of islands settled remain unknown and their timings disputed. Currently, several centuries separate the dates suggested by different archaeological surveys2-4. Here, using genome-wide data frommerely 430 modern individuals from 21 key Pacific island populations and novel ancestry-specific computational analyses, we unravel the detailed genetic history of this vast, dispersed island network. Our reconstruction of the branching Polynesian migration sequence reveals a serial founder expansion, characterized by directional loss of variants, that originated in Samoa and spread first through the Cook Islands (Rarotonga), then to the Society (Totaiete ma) Islands (11th century), the western Austral (Tuha'a Pae) Islands and Tuamotu Archipelago (12th century), and finally to the widely separated, but genetically connected, megalithic statue-building cultures of the Marquesas (Te Henua 'Enana) Islands in the north, Raivavae in the south, and Easter Island (Rapa Nui), the easternmost of the Polynesian islands, settled in approximately AD 1200 via Mangareva.

View details for DOI 10.1038/s41586-021-03902-8

View details for PubMedID 34552258
Mapping the human genetic architecture of COVID-19. Nature COVID-19 Host Genetics Initiative 2021

Abstract

The genetic makeup of an individual contributes to susceptibility and response to viral infection. While environmental, clinical and social factors play a role in exposure to SARS-CoV-2 and COVID-19 disease severity1,2, host genetics may also be important. Identifying host-specific genetic factors may reveal biological mechanisms of therapeutic relevance and clarify causal relationships of modifiable environmental risk factors for SARS-CoV-2 infection and outcomes. We formed a global network of researchers to investigate the role of human genetics in SARS-CoV-2 infection and COVID-19 severity. We describe the results of three genome-wide association meta-analyses comprised of up to 49,562 COVID-19 patients from 46 studies across 19 countries. We reported 13 genome-wide significant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19. Several of these loci correspond to previously documented associations to lung or autoimmune and inflammatory diseases3-7. They also represent potentially actionable mechanisms in response to infection. Mendelian Randomization analyses support a causal role for smoking and body mass index for severe COVID-19 although not for type II diabetes. The identification of novel host genetic factors associated with COVID-19, with unprecedented speed, was made possible by the community of human genetic researchers coming together to prioritize sharing of data, results, resources and analytical frameworks. This working model of international collaboration underscores what is possible for future genetic discoveries in emerging pandemics, or indeed for any complex human disease.

View details for DOI 10.1038/s41586-021-03767-x

View details for PubMedID 34237774
Native American gene flow into Polynesia predating Easter Island settlement. Nature Ioannidis, A. G., Blanco-Portillo, J., Sandoval, K., Hagelberg, E., Miquel-Poblete, J. F., Moreno-Mayar, J. V., Rodriguez-Rodriguez, J. E., Quinto-Cortes, C. D., Auckland, K., Parks, T., Robson, K., Hill, A. V., Avila-Arcos, M. C., Sockell, A., Homburger, J. R., Wojcik, G. L., Barnes, K. C., Herrera, L., Berrios, S., Acuna, M., Llop, E., Eng, C., Huntsman, S., Burchard, E. G., Gignoux, C. R., Cifuentes, L., Verdugo, R. A., Moraga, M., Mentzer, A. J., Bustamante, C. D., Moreno-Estrada, A. 2020

Abstract

The possibility of voyaging contact between prehistoric Polynesian and Native Americanpopulations has long intrigued researchers. Proponents have pointed to the existence of New World crops, such as the sweet potato and bottle gourd, in the Polynesian archaeological record, but nowhere else outside the pre-Columbian Americas1-6, while critics have argued that these botanical dispersals need not have been human mediated7. The Norwegian explorer Thor Heyerdahl controversially suggested that prehistoric South Americanpopulations had an important role in the settlement of east Polynesia and particularly of Easter Island (Rapa Nui)2. Several limited molecular genetic studies have reached opposing conclusions, and the possibility continues to be as hotly contested today as it was when first suggested8-12. Here we analyse genome-wide variation in individuals from islands across Polynesia for signs of Native American admixture, analysing 807 individuals from 17 island populations and 15 Pacific coast Native American groups. We find conclusive evidence for prehistoric contact of Polynesianindividuals with Native Americanindividuals (around AD 1200) contemporaneouswith the settlement of remote Oceania13-15. Our analyses suggest strongly that a single contact event occurred in eastern Polynesia, before the settlement of Rapa Nui, between Polynesianindividuals and a Native American group most closely related to the indigenous inhabitants of present-day Colombia.

View details for DOI 10.1038/s41586-020-2487-2

View details for PubMedID 32641827
Ultra-low-power superconductor logic JOURNAL OF APPLIED PHYSICS Herr, Q. P., Herr, A. Y., Oberg, O. T., Ioannidis, A. G. 2011; 109 (10)

View details for DOI 10.1063/1.3585849

View details for Web of Science ID 000292115900092
The landscape of genomic and socioeconomic variables in colorectal cancer patients based on genetic ancestry. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology Srinivasan, P., Bristow, S. L., Mendez, F. L., Krinshpun, S., Jurdi, A., Liu, M. C., Rabinowitz, M., Wall, J., Bustamante, C. D., Ioannidis, A. G., De La Vega, F. M., Mitchell, B. L., Aleshin, A., Reiter, J. G. 2026

Abstract

BACKGROUND: Despite differences in tumor alterations across genetic ancestries, investigations of the colorectal cancer (CRC) molecular landscape have used self-reported ethnicity instead of genetic ancestry.METHODS: We used tumor and matched normal whole-exome sequencing data from 16,388 stage I-IV CRC patients to investigate CRC's germline and somatic molecular landscape and the potential influence of socioeconomic factors (Distressed Community Index, DCI) across diverse genetic ancestries. Genetic ancestry determined via supervised local ancestry inference included African (AFR, N=1697), Native American (AMR, N=1291), East Asian (EAS, N=2247), European (EUR, N=9726), Levantine Middle Eastern (LME, N=1192), and South Asian (SAS, N=184).RESULTS: Microsatellite instability was the most common form of hypermutation (80.8%), higher in EUR compared to AFR, AMR, and EAS. Among germline findings, positive results were most common in high-penetrance genes associated with Lynch syndrome. Enrichment patterns included MLH1 (SAS) and PMS2 (AFR). There were significant differences in the frequency of driver mutations in APC, BRAF, KRAS, TP53, and PIK3CA between the EUR and other ancestry groups in both MSI and MSS tumors. Mutational signatures suggested enrichment of reactive oxygen species and POLE in AFR, colibactin in EAS, and aflatoxin and NTHL1 in SAS. DCI scores differed by ancestry (higher distress in AFR/AMR than in EUR), but driver mutation frequencies did not vary across DCI quintiles.CONCLUSIONS: Genetic ancestry shapes hereditary risk, tumor biology, and environmental exposures.IMPACT: These findings suggest that incorporating ancestry into screening, trials, and precision oncology may improve equity, though outcome-linked prospective studies and implementation research are warranted.

View details for DOI 10.1158/1055-9965.EPI-26-0020

View details for PubMedID 42084482
Signatures of pathogen-driven selection and Austronesian gene flow of Papua New Guinea HLA alleles. American journal of human genetics Font-Porterias, N., Nemat-Gorgani, N., Kichula, K. M., Al-Hindi, D. R., Harrison, G. F., Tao, S., Zhu, F., Montero-Martin, G., Fernández-Viña, M. A., Guethlein, L. A., Parham, P., Oppenheimer, S. J., Ioannidis, A. G., Moreno-Estrada, A., Pomat, W., Mentzer, A. J., Henn, B. M., Norman, P. J. 2026

Abstract

Human leukocyte antigen (HLA) class I and II are cell surface proteins that display peptide antigens to immune cells, thereby mediating detection of infected cells and production of antibodies. Pathogen exposure and demographic events, including local adaptation and admixture, have driven and maintained exceptional polymorphism of HLA genes across human populations. Papua New Guinea has a complex demography, with geographically distinct populations in the highlands and lowlands and exceptional linguistic heterogeneity throughout the island. The lowland populations retain signatures of Austronesian expansion ∼3,000 years ago. Papua New Guinea populations are also differentially exposed to endemic malarial pathogens, with a greater burden in the lowlands. We analyzed genome-wide autosomal SNP data together with HLA allele sequences, linguistic, and geographical data from 337 Papuans. We find the substructure of HLA alleles to be highly correlated with altitude in Papua New Guinea, a signal that is distinct from the rest of the genome. In addition, specific HLA-B and HLA-DP alleles in lowland groups have a greater number of homozygous genotypes than expected under neutrality. Some of these HLA alleles are of Austronesian genetic ancestry. We find that the HLA-binding repertoires at candidate loci are significantly enriched for antigenic P. falciparum-derived peptides. Together, these results indicate that pathogen-driven selective pressures correlate with the observed HLA genetic substructure in Papua New Guinea, highlighting the critical importance of characterizing highly complex HLA variation in understanding differences in disease susceptibility across diverse human groups.

View details for DOI 10.1016/j.ajhg.2026.04.006

View details for PubMedID 42086051
Pan-African model explains Homo sapiens genetic and morphological evolution. bioRxiv : the preprint server for biology Padilla-Iglesias, C., Xue, Z., Leonardi, M., Perez, M. F., Paijmans, J. L., Colucci, M., Hovhannisyan, A., Maisano-Delser, P., Blanco-Portillo, J., Ioannidis, A. G., Lucarini, G., Cerasoni, J. N., Kandel, A. W., Will, M., Hallett, E. Y., Krapp, M., Lupo, K., Scerri, E. M., Crevecoeur, I., Vinicius, L., Migliano, A. B., Manica, A. 2026

Abstract

A growing body of evidence has challenged the traditional assumption of a single-region origin for Homo sapiens, suggesting instead that our species originated from multiple geographically distinct populations in Africa, which intermittently exchanged genes and culture. However, our understanding of how this Pan-African metapopulation would have evolved through time is still limited. Furthermore, the drivers of such changes are uncertain, and quantitative models of the respective contributions of different African regions are lacking. Here we provide a complete reconstruction of the meta-population dynamics over the last 200,000 years by quantitatively integrating an ecological niche model, informed by archaeological sites, within a spatially explicit population genetic framework. The inferred metapopulation dynamics account for the divergence among all available contemporary and ancient genomes of African hunter-gatherers used to calibrate the model. In addition, it also accurately predicts the patterns of craniometric diversification across the continent from the Late Middle Pleistocene to the present. Finally, we show how the climate-driven changes in population sizes and connectivity are congruent with major patterns of archaeological and phenotypic diversification over the last 200,000 years across the African continent.

View details for DOI 10.1101/2025.05.22.655514

View details for PubMedID 42094577

View details for PubMedCentralID PMC13142462
Graph transformer for ancient ancestry inference. bioRxiv : the preprint server for biology Shanks, C., Bonet, D., Cara, M. C., Ioannidis, A. 2026

Abstract

Local ancestry inference classifies segments of DNA in admixed individuals by their originating population. However, as the date of admixture becomes older, these segments become shorter and determining their ancestry becomes increasingly difficult. This limits many existing segment-based methods to relatively recent historical admixture events and more highly diverged populations. The rapidly expanding availability of ancient DNA offers a promising opportunity to use these ancient samples as references for local ancestry inference. A recent approach integrates ancient samples into the ancestral recombination graph (ARG) for local ancestry inference. Here, we introduce recent advances in deep learning for graphs into this ARG framework to create ARGMix, a graph transformer that infers local ancestry using the coalescent trees of the inferred ARG. Our approach employs ancient samples as references in the marginal trees to predict local ancestry. We train ARGMix on data reflecting the well-understood ancient European demography and demonstrate improved accuracy and robustness even under demographic misspecification. We then apply ARGMix to an ARG of ancient and present-day European samples for ancestry-specific analyses, finding evidence of continuity between Otzi the Iceman and present-day individuals from nearby regions.

View details for DOI 10.64898/2026.04.05.714076

View details for PubMedID 41993422
Session Introduction: AI and Machine Learning in Clinical Medicine Bridging or Separating Model Intelligence and Human Expertise. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Haredasht, F. N., Romano, J. D., Beaulieu-Jones, B. K., Kim, D., Ioannidis, A., Tison, G. H., Daneshjou, R., Chen, J. H. 2026; 31: 1-11

Abstract

Artificial Intelligence (AI) technologies continue to expand their role in clinical medicine, with large language models (LLMs) and multimodal systems now applied to communication, imaging, and predictive analytics. Advances in generative and retrieval-augmented methods have improved the accuracy and contextual grounding of clinical summaries, patient messaging, and decision support. At the same time, new benchmarks in imaging, vision, and spontaneous speech have underscored both progress and the persistence of unsolved challenges. Predictive modeling efforts highlight causality, longitudinal trajectories, and informative clinical events, while methodological contributions emphasize uncertainty management, abstention, and interpretable causal structures. Finally, frameworks for evaluation and governance address the crucial gap between laboratory performance and real-world deployment.

View details for DOI 10.1142/9789819824755_0001

View details for PubMedID 41758129
Analysis of a deeply-phenotyped familial hypercholesterolemia cohort from Mexico shows a role for both rare and common alleles across known dyslipidemia genes and reveals structural variation in a novel locus. Human genomics Katsanis, N., Mourtzi, N., Quinto-Cortés, C. D., Martagon, A. J., Ioannidis, A. G., De La Vega, F. M., Gulcher, J., Lee, M. T., Faghihi, M. A., Lopez-Pineda, A., Moreno-Grau, S., Montserrat, D. M., Barrabés, M., Bonet, D., Fernandez, P. S., Wall, J., Moatamed, B., Mehta, R., Galan-Ramirez, G. A., Zubirán, R., Elias-Lopez, D., Tusié-Luna, T., Aguilar-Salinas, C. A., Bustamante, C. D. 2025; 19 (1): 141

Abstract

Familial hypercholesterolemia (FH) is a genetic disorder driven in part by mutations in three genes that encode components of the cholesterol pathway: LDLR, APOB, and PCSK9. However, the majority of FH genetics has been performed in individuals of European descent. Here, we leveraged a cohort of 300 patients from the Mexican FH registry to understand how rare, high liability alleles and common variants might contribute to shaping individual risk. Using a combination of whole exome and of short- and long-read whole genome sequencing, we report three key findings. First, we observed that rare pathogenic point mutations and structural variants in all known FH genes, together with variants in APOE, CREB3L3, and PLIN1, contribute to a molecular FH diagnosis in 67% of families, including novel gene-disruptive copy number variants (CNVs) which arose in a native American background. Second, ancestry-adjusted polygenic risk score analysis identified a significant liability for coronary artery disease, hypertension, LDL, HDL, and Type 2 Diabetes. The polygenic signal for LDL was present in patients with rare, pathogenic FH mutations and was more prominent in individuals bereft of a molecular FH diagnosis. Finally, we report both a whole-gene duplication and common, non-coding variants in a novel locus, PDZK1, which contribute to the genetic burden of FH, a finding we replicated in the UK Biobank (UKB). Together, our analyses illustrate the value of genetic studies in non-European populations and reinforce the notion that individual risk to disease can arise from both rare, large effect alleles (alone or in combination across genes) and common variants that increase the mutational burden of a biological system.

View details for DOI 10.1186/s40246-025-00831-9

View details for PubMedID 41287107

View details for PubMedCentralID 5389990
Advances in Biomedical Missing Data Imputation: A Survey IEEE ACCESS Barrabes, M., Perera, M., Novelle Moriano, V., Giro-I-Nieto, X., Mas Montserrat, D., Ioannidis, A. G. 2025; 13: 16918-16932

View details for DOI 10.1109/ACCESS.2024.3516506

View details for Web of Science ID 001410383800049
Feature Shift Localization Network Barrabes, M., Montserrat, D., Dev, K., Ioannidis, A. G. edited by Singh, A., Fazel, M., Hsu, D., Lacoste-Julien, S., Berkenkamp, F., Maharaj, T., Wagstaff, K., Zhu, J. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2025

View details for Web of Science ID 001669603900110
Compressive Meta-Learning Montserrat, D., Bonet, D., Perera, M., Giro-I-Nieto, X., Ioannidis, A. G., ACM ASSOC COMPUTING MACHINERY. 2025: 2102-2113

View details for DOI 10.1145/3711896.3736889

View details for Web of Science ID 001592428900183
An archaic HLA class I receptor allele diversifies natural killer cell-driven immunity in First Nations peoples of Oceania. Cell Loh, L., Saunders, P. M., Faoro, C., Font-Porterias, N., Nemat-Gorgani, N., Harrison, G. F., Sadeeq, S., Hensen, L., Wong, S. C., Widjaja, J., Clemens, E. B., Zhu, S., Kichula, K. M., Tao, S., Zhu, F., Montero-Martin, G., Fernandez-Vina, M., Guethlein, L. A., Vivian, J. P., Davies, J., Mentzer, A. J., Oppenheimer, S. J., Pomat, W., Ioannidis, A. G., Barberena-Jonas, C., Moreno-Estrada, A., Miller, A., Parham, P., Rossjohn, J., Tong, S. Y., Kedzierska, K., Brooks, A. G., Norman, P. J. 2024

Abstract

Genetic variation in host immunity impacts the disproportionate burden of infectious diseases that can be experienced by First Nations peoples. Polymorphic human leukocyte antigen (HLA) class I and killer cell immunoglobulin-like receptors (KIRs) are key regulators of natural killer (NK) cells, which mediate early infection control. How this variation impacts their responses across populations is unclear. We show that HLA-A∗24:02 became the dominant ligand for inhibitory KIR3DL1 in First Nations peoples across Oceania, through positive natural selection. We identify KIR3DL1∗114, widespread across and unique to Oceania, as an allele lineage derived from archaic humans. KIR3DL1∗114+NK cells from First Nations Australian donors are inhibited through binding HLA-A∗24:02. The KIR3DL1∗114 lineage is defined by phenylalanine at residue 166. Structural and binding studies show phenylalanine 166 forms multiple unique contacts with HLA-peptide complexes, increasing both affinity and specificity. Accordingly, assessing immunogenetic variation and the functional implications for immunity are fundamental toward understanding population-based disease associations.

View details for DOI 10.1016/j.cell.2024.10.005

View details for PubMedID 39476840
Polygenic risk score portability for common diseases across genetically diverse populations. Human genomics Moreno-Grau, S., Vernekar, M., Lopez-Pineda, A., Mas-Montserrat, D., Barrabés, M., Quinto-Cortés, C. D., Moatamed, B., Lee, M. T., Yu, Z., Numakura, K., Matsuda, Y., Wall, J. D., Ioannidis, A. G., Katsanis, N., Takano, T., Bustamante, C. D. 2024; 18 (1): 93

Abstract

Polygenic risk scores (PRS) derived from European individuals have reduced portability across global populations, limiting their clinical implementation at worldwide scale. Here, we investigate the performance of a wide range of PRS models across four ancestry groups (Africans, Europeans, East Asians, and South Asians) for 14 conditions of high-medical interest.To select the best-performing model per trait, we first compared PRS performances for publicly available scores, and constructed new models using different methods (LDpred2, PRS-CSx and SNPnet). We used 285 K European individuals from the UK Biobank (UKBB) for training and 18 K, including diverse ancestries, for testing. We then evaluated PRS portability for the best models in Europeans and compared their accuracies with respect to the best PRS per ancestry. Finally, we validated the selected PRS models using an independent set of 8,417 individuals from Biobank of the Americas-Genomelink (BbofA-GL); and performed a PRS-Phewas.We confirmed a decay in PRS performances relative to Europeans when the evaluation was conducted using the best-PRS model for Europeans (51.3% for South Asians, 46.6% for East Asians and 39.4% for Africans). We observed an improvement in the PRS performances when specifically selecting ancestry specific PRS models (phenotype variance increase: 1.62 for Africans, 1.40 for South Asians and 0.96 for East Asians). Additionally, when we selected the optimal model conditional on ancestry for CAD, HDL-C and LDL-C, hypertension, hypothyroidism and T2D, PRS performance for studied populations was more comparable to what was observed in Europeans. Finally, we were able to independently validate tested models for Europeans, and conducted a PRS-Phewas, identifying cross-trait interplay between cardiometabolic conditions, and between immune-mediated components.Our work comprehensively evaluated PRS accuracy across a wide range of phenotypes, reducing the uncertainty with respect to which PRS model to choose and in which ancestry group. This evaluation has let us identify specific conditions where implementing risk-prioritization strategies could have practical utility across diverse ancestral groups, contributing to democratizing the implementation of PRS.

View details for DOI 10.1186/s40246-024-00664-y

View details for PubMedID 39218908

View details for PubMedCentralID PMC11367857
Genetic Signatures of Positive Selection in Human Populations Adapted to High Altitude in Papua New Guinea. Genome biology and evolution Gonzalez-Buenfil, R., Vieyra-Sanchez, S., Quinto-Cortes, C. D., Oppenheimer, S. J., Pomat, W., Laman, M., Cervantes-Hernandez, M. C., Barberena-Jonas, C., Auckland, K., Allen, A., Allen, S., Phipps, M. E., Huerta-Sanchez, E., Ioannidis, A. G., Mentzer, A. J., Moreno-Estrada, A. 2024; 16 (8)

Abstract

Papua New Guinea (PNG) hosts distinct environments mainly represented by the ecoregions of the Highlands and Lowlands that display increased altitude and a predominance of pathogens, respectively. Since its initial peopling approximately 50,000 years ago, inhabitants of these ecoregions might have differentially adapted to the environmental pressures exerted by each of them. However, the genetic basis of adaptation in populations from these areas remains understudied. Here, we investigated signals of positive selection in 62 highlanders and 43 lowlanders across 14 locations in the main island of PNG using whole-genome genotype data from the Oceanian Genome Variation Project (OGVP) and searched for signals of positive selection through population differentiation and haplotype-based selection scans. Additionally, we performed archaic ancestry estimation to detect selection signals in highlanders within introgressed regions of the genome. Among highland populations we identified candidate genes representing known biomarkers for mountain sickness (SAA4, SAA1, PRDX1, LDHA) as well as candidate genes of the Notch signaling pathway (PSEN1, NUMB, RBPJ, MAML3), a novel proposed pathway for high altitude adaptation in multiple organisms. We also identified candidate genes involved in oxidative stress, inflammation, and angiogenesis, processes inducible by hypoxia, as well as in components of the eye lens and the immune response. In contrast, candidate genes in the lowlands are mainly related to the immune response (HLA-DQB1, HLA-DQA2, TAAR6, TAAR9, TAAR8, RNASE4, RNASE6, ANG). Moreover, we find two candidate regions to be also enriched with archaic introgressed segments, suggesting that archaic admixture has played a role in the local adaptation of PNG populations.

View details for DOI 10.1093/gbe/evae161

View details for PubMedID 39173139
Evaluating disparities in receptor status, overall survival, and time to hormone therapy among women with breast cancer Taparra, K. A., DeVille, N. V., Melendez-Ramos, A., Blanco-Portillo, J., Ioannidis, A., Patel, M. I., Pollom, E. L., Horst, K. C. LIPPINCOTT WILLIAMS & WILKINS. 2024

View details for Web of Science ID 001275557402568
Genetic landscape of colorectal cancer (CRC) across genetic ancestries: Implications for early cancer detection (ECD). Ioannidis, A., Srinivasan, P., Bristow, S. L., Krinshpun, S., Solari, O., Rivero-Hinojosa, S., Aushev, V. N., Jurdi, A. A., Liu, M. C., Mitchell, B., Aleshin, A., Reiter, J. G., Nakamura, Y., Yoshino, T., Wall, J., Myer, P., Bustamante, C. D. LIPPINCOTT WILLIAMS & WILKINS. 2024

View details for Web of Science ID 001275557404030
Deep history of cultural and linguistic evolution among Central African hunter-gatherers. Nature human behaviour Padilla-Iglesias, C., Blanco-Portillo, J., Pricop, B., Ioannidis, A. G., Bickel, B., Manica, A., Vinicius, L., Migliano, A. B. 2024

Abstract

Human evolutionary history in Central Africa reflects a deep history of population connectivity. However, Central African hunter-gatherers (CAHGs) currently speak languages acquired from their neighbouring farmers. Hence it remains unclear which aspects of CAHG cultural diversity results from long-term evolution preceding agriculture and which reflect borrowing from farmers. On the basis of musical instruments, foraging tools, specialized vocabulary and genome-wide data from ten CAHG populations, we reveal evidence of large-scale cultural interconnectivity among CAHGs before and after the Bantu expansion. We also show that the distribution of hunter-gatherer musical instruments correlates with the oldest genomic segments in our sample predating farming. Music-related words are widely shared between western and eastern groups and likely precede the borrowing of Bantu languages. In contrast, subsistence tools are less frequently exchanged and may result from adaptation to local ecologies. We conclude that CAHG material culture and specialized lexicon reflect a long evolutionary history in Central Africa.

View details for DOI 10.1038/s41562-024-01891-y

View details for PubMedID 38802540

View details for PubMedCentralID 6092560
Comparison of colorectal cancer (CRC) characteristics across genetic ancestries: Implications for early cancer detection (ECD). Myer, P., Srinivasan, P., Bristow, S. L., Krinshpun, S., Solari, O., Aushev, V. N., Jurdi, A. A., Liu, M. C., Mitchell, B. L., Aleshin, A., Reiter, J. G., Weitzel, J. N., Nakamura, Y., Yoshino, T., Wall, J., Ioannidis, A., Bustamante, C. LIPPINCOTT WILLIAMS & WILKINS. 2024: 164

View details for DOI 10.1200/JCO.2024.42.3_suppl.164

View details for Web of Science ID 001266680500514
Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Bonet, D., Levin, M., Montserrat, D. M., Ioannidis, A. G. 2024; 29: 404-418

Abstract

Precision medicine models often perform better for populations of European ancestry due to the over-representation of this group in the genomic datasets and large-scale biobanks from which the models are constructed. As a result, prediction models may misrepresent or provide less accurate treatment recommendations for underrepresented populations, contributing to health disparities. This study introduces an adaptable machine learning toolkit that integrates multiple existing methodologies and novel techniques to enhance the prediction accuracy for underrepresented populations in genomic datasets. By leveraging machine learning techniques, including gradient boosting and automated methods, coupled with novel population-conditional re-sampling techniques, our method significantly improves the phenotypic prediction from single nucleotide polymorphism (SNP) data for diverse populations. We evaluate our approach using the UK Biobank, which is composed primarily of British individuals with European ancestry, and a minority representation of groups with Asian and African ancestry. Performance metrics demonstrate substantial improvements in phenotype prediction for underrepresented groups, achieving prediction accuracy comparable to that of the majority group. This approach represents a significant step towards improving prediction accuracy amidst current dataset diversity challenges. By integrating a tailored pipeline, our approach fosters more equitable validity and utility of statistical genetics methods, paving the way for more inclusive models and outcomes.

View details for PubMedID 38160295
HyperFast: Instant Classification for Tabular Data Bonet, D., Montserrat, D., Giro-i-Nieto, X., Ioannidis, A. G. edited by Wooldridge, M., Dy, J., Natarajan, S. ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 11114-11123

View details for Web of Science ID 001241513600041
Overcoming health disparities in precision medicine De la Vega, F. M., Barnes, K. C., Fox, K., Ioannidis, A., Kenny, E., Mathias, R. A., Pasaniuc, B. edited by Hunter, L., Altman, R. B., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2024: 322-326

View details for Web of Science ID 001258333100024
PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Comajoan Cara, M., Mas Montserrat, D., Ioannidis, A. G. 2024; 29: 327-340

Abstract

The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.Our code is available at https://github.com/AI-sandbox/PopGenAdapt.

View details for PubMedID 38160290
Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations. bioRxiv : the preprint server for biology Bonet, D., Levin, M., Montserrat, D. M., Ioannidis, A. G. 2023

Abstract

Precision medicine models often perform better for populations of European ancestry due to the over-representation of this group in the genomic datasets and large-scale biobanks from which the models are constructed. As a result, prediction models may misrepresent or provide less accurate treatment recommendations for underrepresented populations, contributing to health disparities. This study introduces an adaptable machine learning toolkit that integrates multiple existing methodologies and novel techniques to enhance the prediction accuracy for underrepresented populations in genomic datasets. By leveraging machine learning techniques, including gradient boosting and automated methods, coupled with novel population-conditional re-sampling techniques, our method significantly improves the phenotypic prediction from single nucleotide polymorphism (SNP) data for diverse populations. We evaluate our approach using the UK Biobank, which is composed primarily of British individuals with European ancestry, and a minority representation of groups with Asian and African ancestry. Performance metrics demonstrate substantial improvements in phenotype prediction for underrepresented groups, achieving prediction accuracy comparable to that of the majority group. This approach represents a significant step towards improving prediction accuracy amidst current dataset diversity challenges. By integrating a tailored pipeline, our approach fosters more equitable validity and utility of statistical genetics methods, paving the way for more inclusive models and outcomes.

View details for DOI 10.1101/2023.10.12.561949

View details for PubMedID 37904983

View details for PubMedCentralID PMC10614800
Mexican Biobank advances population and medical genomics of diverse ancestries. Nature Sohail, M., Palma-Martínez, M. J., Chong, A. Y., Quinto-Cortés, C. D., Barberena-Jonas, C., Medina-Muñoz, S. G., Ragsdale, A., Delgado-Sánchez, G., Cruz-Hervert, L. P., Ferreyra-Reyes, L., Ferreira-Guerrero, E., Mongua-Rodríguez, N., Canizales-Quintero, S., Jimenez-Kaufmann, A., Moreno-Macías, H., Aguilar-Salinas, C. A., Auckland, K., Cortés, A., Acuña-Alonzo, V., Gignoux, C. R., Wojcik, G. L., Ioannidis, A. G., Fernández-Valverde, S. L., Hill, A. V., Tusié-Luna, M. T., Mentzer, A. J., Novembre, J., García-García, L., Moreno-Estrada, A. 2023

Abstract

Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories and complex trait architectures remain hidden owing to insufficient data1. To fill this gap, the Mexican Biobank project genotyped 6,057 individuals from 898 rural and urban localities across all 32 states in Mexico at a resolution of 1.8 million genome-wide markers with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Here, using ancestry deconvolution and inference of identity-by-descent segments, we inferred ancestral population sizes across Mesoamerican regions over time, unravelling Indigenous, colonial and postcolonial demographic dynamics2-6. We observed variation in runs of homozygosity among genomic regions with different ancestries reflecting distinct demographic histories and, in turn, different distributions of rare deleterious variants. We conducted genome-wide association studies (GWAS) for 22 complex traits and found that several traits are better predicted using the Mexican Biobank GWAS compared to the UK Biobank GWAS7,8. We identified genetic and environmental factors associating with trait variation, such as the length of the genome in runs of homozygosity as a predictor for body mass index, triglycerides, glucose and height. This study provides insights into the genetic histories of individuals in Mexico and dissects their complex trait architectures, both crucial for making precision and preventive medicine initiatives accessible worldwide.

View details for DOI 10.1038/s41586-023-06560-0

View details for PubMedID 37821706

View details for PubMedCentralID 3738819
PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations. bioRxiv : the preprint server for biology Cara, M. C., Montserrat, D. M., Ioannidis, A. G. 2023

Abstract

The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.

View details for DOI 10.1101/2023.10.10.561715

View details for PubMedID 37873492
Demographic history and genetic structure in pre-Hispanic Central Mexico. Science (New York, N.Y.) Villa-Islas, V., Izarraras-Gomez, A., Larena, M., Campos, E. M., Sandoval-Velasco, M., Rodríguez-Rodríguez, J. E., Bravo-Lopez, M., Moguel, B., Fregel, R., Garfias-Morales, E., Medina Tretmanis, J., Velázquez-Ramírez, D. A., Herrera-Muñóz, A., Sandoval, K., Nieves-Colón, M. A., Zepeda García Moreno, G., Villanea, F. A., Medina, E. F., Aguayo-Haro, R., Valdiosera, C., Ioannidis, A. G., Moreno-Estrada, A., Jay, F., Huerta-Sanchez, E., Moreno-Mayar, J. V., Sánchez-Quinto, F., Ávila-Arcos, M. C. 2023; 380 (6645): eadd6142

Abstract

Aridoamerica and Mesoamerica are two distinct cultural areas in northern and central Mexico, respectively, that hosted numerous pre-Hispanic civilizations between 2500 BCE and 1521 CE. The division between these regions shifted southward because of severe droughts ~1100 years ago, which allegedly drove a population replacement in central Mexico by Aridoamerican peoples. In this study, we present shotgun genome-wide data from 12 individuals and 27 mitochondrial genomes from eight pre-Hispanic archaeological sites across Mexico, including two at the shifting border of Aridoamerica and Mesoamerica. We find population continuity that spans the climate change episode and a broad preservation of the genetic structure across present-day Mexico for the past 2300 years. Lastly, we identify a contribution to pre-Hispanic populations of northern and central Mexico from two ancient unsampled "ghost" populations.

View details for DOI 10.1126/science.add6142

View details for PubMedID 37167382
Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell genomics Zhou, W., Kanai, M., Wu, K. H., Rasheed, H., Tsuo, K., Hirbo, J. B., Wang, Y., Bhattacharya, A., Zhao, H., Namba, S., Surakka, I., Wolford, B. N., Lo Faro, V., Lopera-Maya, E. A., Läll, K., Favé, M. J., Partanen, J. J., Chapman, S. B., Karjalainen, J., Kurki, M., Maasha, M., Brumpton, B. M., Chavan, S., Chen, T. T., Daya, M., Ding, Y., Feng, Y. A., Guare, L. A., Gignoux, C. R., Graham, S. E., Hornsby, W. E., Ingold, N., Ismail, S. I., Johnson, R., Laisk, T., Lin, K., Lv, J., Millwood, I. Y., Moreno-Grau, S., Nam, K., Palta, P., Pandit, A., Preuss, M. H., Saad, C., Setia-Verma, S., Thorsteinsdottir, U., Uzunovic, J., Verma, A., Zawistowski, M., Zhong, X., Afifi, N., Al-Dabhani, K. M., Al Thani, A., Bradford, Y., Campbell, A., Crooks, K., de Bock, G. H., Damrauer, S. M., Douville, N. J., Finer, S., Fritsche, L. G., Fthenou, E., Gonzalez-Arroyo, G., Griffiths, C. J., Guo, Y., Hunt, K. A., Ioannidis, A., Jansonius, N. M., Konuma, T., Lee, M. T., Lopez-Pineda, A., Matsuda, Y., Marioni, R. E., Moatamed, B., Nava-Aguilar, M. A., Numakura, K., Patil, S., Rafaels, N., Richmond, A., Rojas-Muñoz, A., Shortt, J. A., Straub, P., Tao, R., Vanderwerff, B., Vernekar, M., Veturi, Y., Barnes, K. C., Boezen, M., Chen, Z., Chen, C. Y., Cho, J., Smith, G. D., Finucane, H. K., Franke, L., Gamazon, E. R., Ganna, A., Gaunt, T. R., Ge, T., Huang, H., Huffman, J., Katsanis, N., Koskela, J. T., Lajonchere, C., Law, M. H., Li, L., Lindgren, C. M., Loos, R. J., MacGregor, S., Matsuda, K., Olsen, C. M., Porteous, D. J., Shavit, J. A., Snieder, H., Takano, T., Trembath, R. C., Vonk, J. M., Whiteman, D. C., Wicks, S. J., Wijmenga, C., Wright, J., Zheng, J., Zhou, X., Awadalla, P., Boehnke, M., Bustamante, C. D., Cox, N. J., Fatumo, S., Geschwind, D. H., Hayward, C., Hveem, K., Kenny, E. E., Lee, S., Lin, Y. F., Mbarek, H., Mägi, R., Martin, H. C., Medland, S. E., Okada, Y., Palotie, A. V., Pasaniuc, B., Rader, D. J., Ritchie, M. D., Sanna, S., Smoller, J. W., Stefansson, K., van Heel, D. A., Walters, R. G., Zöllner, S., Martin, A. R., Willer, C. J., Daly, M. J., Neale, B. M. 2022; 2 (10): 100192

Abstract

Biobanks facilitate genome-wide association studies (GWASs), which have mapped genomic loci across a range of human diseases and traits. However, most biobanks are primarily composed of individuals of European ancestry. We introduce the Global Biobank Meta-analysis Initiative (GBMI)-a collaborative network of 23 biobanks from 4 continents representing more than 2.2 million consented individuals with genetic data linked to electronic health records. GBMI meta-analyzes summary statistics from GWASs generated using harmonized genotypes and phenotypes from member biobanks for 14 exemplar diseases and endpoints. This strategy validates that GWASs conducted in diverse biobanks can be integrated despite heterogeneity in case definitions, recruitment strategies, and baseline characteristics. This collaborative effort improves GWAS power for diseases, benefits understudied diseases, and improves risk prediction while also enabling the nomination of disease genes and drug candidates by incorporating gene and protein expression data and providing insight into the underlying biology of human diseases and traits.

View details for DOI 10.1016/j.xgen.2022.100192

View details for PubMedID 36777996

View details for PubMedCentralID PMC9903716
SALAI-Net: species-agnostic local ancestry inference network. Bioinformatics (Oxford, England) Oriol Sabat, B., Mas Montserrat, D., Giro-I-Nieto, X., Ioannidis, A. G. 2022; 38 (Supplement_2): ii27-ii33

Abstract

MOTIVATION: Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications.RESULTS: We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models' ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods.AVAILABILITY AND IMPLEMENTATION: We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes).SUPPLEMENTARY INFORMATION: Supplementary data are available from Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btac464

View details for PubMedID 36124792
Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research. Human genomics Lopez-Pineda, A., Vernekar, M., Moreno-Grau, S., Rojas-Munoz, A., Moatamed, B., Lee, M. T., Nava-Aguilar, M. A., Gonzalez-Arroyo, G., Numakura, K., Matsuda, Y., Ioannidis, A., Katsanis, N., Takano, T., Bustamante, C. D. 2022; 16 (1): 37

Abstract

INTRODUCTION: A major challenge to enabling precision health at a global scale is the bias between those who enroll in state sponsored genomic research and those suffering from chronic disease. More than 30 million people have been genotyped by direct-to-consumer (DTC) companies such as 23andMe, Ancestry DNA, and MyHeritage, providing a potential mechanism for democratizing access to medical interventions and thus catalyzing improvements in patient outcomes as the cost of data acquisition drops. However, much of these data are sequestered in the initial provider network, without the ability for the scientific community to either access or validate. Here, we present a novel geno-pheno platform that integrates heterogeneous data sources and applies learnings to common chronic disease conditions including Type 2 diabetes (T2D) and hypertension.METHODS: We collected genotyped data from a novel DTC platform where participants upload their genotype data files and were invited to answer general health questionnaires regarding cardiometabolic traits over a period of 6months. Quality control, imputation, and genome-wide association studies were performed on this dataset, and polygenic risk scores were built in a case-control setting using the BASIL algorithm.RESULTS: We collected data on N=4,550 (389 cases / 4,161 controls) who reported being affected or previously affected for T2D and N=4,528 (1,027 cases / 3,501 controls) for hypertension. We identified 164 out of 272 variants showing identical effect direction to previously reported genome-significant findings in Europeans. Performance metric of the PRS models was AUC=0.68, which is comparable to previously published PRS models obtained with larger datasets including clinical biomarkers.DISCUSSION: DTC platforms have the potential of inverting research models of genome sequencing and phenotypic data acquisition. Quality control (QC) mechanisms proved to successfully enable traditional GWAS and PRS analyses. The direct participation of individuals has shown the potential to generate rich datasets enabling the creation of PRS cardiometabolic models. More importantly, federated learning of PRS from reuse of DTC data provides a mechanism for scaling precision health care delivery beyond the small number of countries who can afford to finance these efforts directly.CONCLUSIONS: The genetics of T2D and hypertension have been studied extensively in controlled datasets, and various polygenic risk scores (PRS) have been developed. We developed predictive tools for both phenotypes trained with heterogeneous genotypic and phenotypic data generated outside of the clinical environment and show that our methods can recapitulate prior findings with fidelity. From these observations, we conclude that it is possible to leverage DTC genetic repositories to identify individuals at risk of debilitating diseases based on their unique genetic landscape so that informed, timely clinical interventions can be incorporated.

View details for DOI 10.1186/s40246-022-00406-y

View details for PubMedID 36076307
Ancient DNA reveals five streams of migration into Micronesia and matrilocality in early Pacific seafarers. Science (New York, N.Y.) Liu, Y. C., Hunter-Anderson, R., Cheronet, O., Eakin, J., Camacho, F., Pietrusewsky, M., Rohland, N., Ioannidis, A., Athens, J. S., Douglas, M. T., Ikehara-Quebral, R. M., Bernardos, R., Culleton, B. J., Mah, M., Adamski, N., Broomandkhoshbacht, N., Callan, K., Lawson, A. M., Mandl, K., Michel, M., Oppenheimer, J., Stewardson, K., Zalzala, F., Kidd, K., Kidd, J., Schurr, T. G., Auckland, K., Hill, A. V., Mentzer, A. J., Quinto-Cortés, C. D., Robson, K., Kennett, D. J., Patterson, N., Bustamante, C. D., Moreno-Estrada, A., Spriggs, M., Vilar, M., Lipson, M., Pinhasi, R., Reich, D. 2022; 377 (6601): 72-79

Abstract

Micronesia began to be peopled earlier than other parts of Remote Oceania, but the origins of its inhabitants remain unclear. We generated genome-wide data from 164 ancient and 112 modern individuals. Analysis reveals five migratory streams into Micronesia. Three are East Asian related, one is Polynesian, and a fifth is a Papuan source related to mainland New Guineans that is different from the New Britain-related Papuan source for southwest Pacific populations but is similarly derived from male migrants ~2500 to 2000 years ago. People of the Mariana Archipelago may derive all of their precolonial ancestry from East Asian sources, making them the only Remote Oceanians without Papuan ancestry. Female-inherited mitochondrial DNA was highly differentiated across early Remote Oceanian communities but homogeneous within, implying matrilocal practices whereby women almost never raised their children in communities different from the ones in which they grew up.

View details for DOI 10.1126/science.abm6536

View details for PubMedID 35771911
Predicting Dog Phenotypes from Genotypes. Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference Bartusiak, E. R., Barrabes, M., Rymbekova, A., Gimbernat-Mayol, J., Lopez, C., Barberis, L., Montserrat, D. M., Giro-I-Nieto, X., Ioannidis, A. G. 2022; 2022: 3558-3562

Abstract

We analyze dog genotypes (i.e., positions of dog DNA sequences that often vary between different dogs) in order to predict the corresponding phenotypes (i.e., unique observed characteristics). More specifically, given chromosome data from a dog, we aim to predict the breed, height, and weight. We explore a variety of linear and non-linear classification and regression techniques to accomplish these three tasks. We also investigate the use of a neural network (both in linear and non-linear modes) for breed classification and compare the performance to traditional statistical methods. We show that linear methods generally outperform or match the performance of non-linear methods for breed classification. However, we show that the reverse is true for height and weight regression. Finally, we evaluate the results of all of these methods based on the number of input features used in the analysis. We conduct experiments using different fractions of the full genomic sequences, resulting in input sequences ranging from 20 SNPs to ∼200k SNPs. In doing so, we explore the impact of using a very limited number of SNPs for prediction. Our experiments demonstrate that these phenotypes in dogs can be predicted with as few as 0.5% of randomly selected SNPs (i.e., 992 SNPs) and that dog breeds can be classified with 50% balanced accuracy with as few as 0.02% SNPs (i.e., 40 SNPs).

View details for DOI 10.1109/EMBC48229.2022.9870905

View details for PubMedID 36085664
Generative Moment Matching Networks for Genotype Simulation. Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference Perera, M., Montserrat, D. M., Barrabes, M., Geleta, M., Giro-I-Nieto, X., Ioannidis, A. G. 2022; 2022: 1379-1383

Abstract

The generation of synthetic genomic sequences using neural networks has potential to ameliorate privacy and data sharing concerns and to mitigate potential bias within datasets due to under-representation of some population groups. However, there is not a consensus on which architectures, training procedures, and evaluation metrics should be used when simulating single nucleotide polymorphism (SNP) sequences with neural networks. In this paper, we explore the use of Generative Moment Matching Networks (GMMNs) for SNP simulation, we present some architectural and procedural changes to properly train the networks, and we introduce an evaluation scheme to qualitatively and quantitatively assess the quality of the simulated sequences.

View details for DOI 10.1109/EMBC48229.2022.9871045

View details for PubMedID 36086656
The genetic legacy of the Manila galleon trade in Mexico. Philosophical transactions of the Royal Society of London. Series B, Biological sciences Rodriguez-Rodriguez, J. E., Ioannidis, A. G., Medina-Munoz, S. G., Barberena-Jonas, C., Blanco-Portillo, J., Quinto-Cortes, C. D., Moreno-Estrada, A. 2022; 377 (1852): 20200419

Abstract

The population of Mexico has a considerable genetic substructure due to both its pre-Columbian diversity and due to genetic admixture from post-Columbian trans-oceanic migrations. The latter primarily originated in Europe and Africa, but also, to a lesser extent, in Asia. We analyze previously understudied genetic connections between Asia and Mexico to infer the timing and source of this genetic ancestry in Mexico. We identify the predominant origin within Southeast Asia-specifically western Indonesian and non-Negrito Filipino sources-and we date its arrival in Mexico to approximately 13 generations ago (1620 CE). This points to a genetic legacy from the seventeenth century Manila galleon trade between the colonial Spanish Philippines and the Pacific port of Acapulco. Indeed, within Mexico we observe the highest level of this trans-Pacific ancestry in Acapulco, located in the state of Guerrero. This colonial Spanish trade route from East Asia to Europe was centred on Mexico and appears in historical records, but its legacy has been largely ignored. Identities and stories were suppressed due to slavery, assimilation of the immigrants as 'Indios' and incomplete historical records. Here we characterize this understudied Mexican ancestry. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.

View details for DOI 10.1098/rstb.2020.0419

View details for PubMedID 35430879
Opportunities and challenges for the use of common controls in sequencing studies. Nature reviews. Genetics Wojcik, G. L., Murphy, J., Edelson, J. L., Gignoux, C. R., Ioannidis, A. G., Manning, A., Rivas, M. A., Buyske, S., Hendricks, A. E. 2022

Abstract

Genome-wide association studies using large-scale genome and exome sequencing data have become increasingly valuable in identifying associations between genetic variants and disease, transforming basic research and translational medicine. However, this progress has not been equally shared across all people and conditions, in part due to limited resources. Leveraging publicly available sequencing data as external common controls, rather than sequencing new controls for every study, can better allocate resources by augmenting control sample sizes or providing controls where none existed. However, common control studies must be carefully planned and executed as even small differences in sample ascertainment and processing can result in substantial bias. Here, we discuss challenges and opportunities for the robust use of common controls in high-throughput sequencing studies, including study design, quality control and statistical approaches. Thoughtful generation and use of large and valuable genetic sequencing data sets will enable investigation of a broader and more representative set of conditions, environments and genetic ancestries than otherwise possible.

View details for DOI 10.1038/s41576-022-00487-4

View details for PubMedID 35581355
Bayesian model comparison for rare-variant association studies. American journal of human genetics Venkataraman, G. R., DeBoever, C., Tanigawa, Y., Aguirre, M., Ioannidis, A. G., Mostafavi, H., Spencer, C. C., Poterba, T., Bustamante, C. D., Daly, M. J., Pirinen, M., Rivas, M. A. 2021

Abstract

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.

View details for DOI 10.1016/j.ajhg.2021.11.005

View details for PubMedID 34822764
High Resolution Ancestry Deconvolution for Next Generation Genomic Data bioRxiv Hilmarsson, H., Kumar, A. S., Rastogi, R., Bustamante, C. D., Mas Montserrat, D., Ioannidis, A. G. 2021
Neural ADMIXTURE: rapid population clustering with autoencoders bioRxiv Dominguez Mantes, A., Mas Montserrat, D., Bustamante, C., Giró-i-Nieto, X., Ioannidis, A. G. 2021
Discovering prescription patterns in pediatric acute-onset neuropsychiatric syndrome patients. Journal of biomedical informatics Lopez Pineda, A., Pourshafeie, A., Ioannidis, A., McCloskey Leibold, C., Chan, A. L., Bustamante, C. D., Frankovich, J., Wojcik, G. L. 2020: 103664

Abstract

OBJECTIVE: Pediatric acute-onset neuropsychiatric syndrome (PANS) is a complex neuropsychiatric syndrome characterized by an abrupt onset of obsessive-compulsive symptoms and/or severe eating restrictions, along with at least two concomitant debilitating cognitive, behavioral, or neurological symptoms. A wide range of pharmacological interventions along with behavioral and environmental modifications, and psychotherapies have been adopted to treat symptoms and underlying etiologies. Our goal was to develop a data-driven approach to identify treatment patterns in this cohort.MATERIALS AND METHODS: In this cohort study, we extracted medical prescription histories from electronic health records. We developed a modified dynamic programming approach to perform global alignment of those medication histories. Our approach is unique since it considers time gaps in prescription patterns as part of the similarity strategy.RESULTS: This study included 43 consecutive new-onset pre-pubertal patients who had at least 3 clinic visits. Our algorithm identified six clusters with distinct medication usage history which may represent clinician's practice of treating PANS of different severities and etiologies i.e., two most severe groups requiring high dose intravenous steroids; two arthritic or inflammatory groups requiring prolonged nonsteroidal anti-inflammatory drug (NSAID); and two mild relapsing/remitting group treated with a short course of NSAID. The psychometric scores as outcomes in each cluster generally improved within the first two years.DISCUSSION: and conclusion Our algorithm shows potential to improve our knowledge of treatment patterns in the PANS cohort, while helping clinicians understand how patients respond to a combination of drugs.

View details for DOI 10.1016/j.jbi.2020.103664

View details for PubMedID 33359113
LAI-NET: LOCAL-ANCESTRY INFERENCE WITH NEURAL NETWORKS Montserrat, D., Bustamante, C., Ioannidis, A., IEEE IEEE. 2020: 1314–18

View details for Web of Science ID 000615970401111
Class-Conditional VAE-GAN for Local-Ancestry Simulation MLCB Proceedings Mas Montserrat, D., Bustamante, C., Ioannidis, A. G. 2019
Reconstructing admixture and migration dynamics of post-contact Mexico Esteban Rodriguez-Rodriguez, J., Blanco-Portillo, J., Ioannidis, A., Moreno-Estrada, A. WILEY. 2018: 228

View details for Web of Science ID 000430656803170
Integrated Power Divider for Superconducting Digital Circuits IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY Oberg, O. T., Herr, Q. P., Ioannidis, A. G., Herr, A. Y. 2011; 21 (3): 571–74

View details for DOI 10.1109/TASC.2010.2086415

View details for Web of Science ID 000291050500113
Digital circuits using self-shunted Nb/NbxSi1-x/Nb Josephson junctions APPLIED PHYSICS LETTERS Olaya, D., Dresselhaus, P. D., Benz, S. P., Herr, A., Herr, Q. P., Ioannidis, A. G., Miller, D. L., Kleinsasser, A. W. 2010; 96 (21)

View details for DOI 10.1063/1.3432065

View details for Web of Science ID 000278183200086

Alexander Ioannidis

Assistant Professor (Research) of Genetics and of Biomedical Data ScienceAdjunct Professor, Institute for Computational and Mathematical Engineering (ICME)

Bio

Academic Appointments

Professional Education

Links

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Assistant Professor (Research) of Genetics and of Biomedical Data Science
Adjunct Professor, Institute for Computational and Mathematical Engineering (ICME)