Dr. Nilah Ioannidis is a postdoc in the Department of Biomedical Data Science working on statistical and computational methods for interpreting personal genomes. She develops machine learning tools to predict the clinical significance of rare variants of unknown significance from whole genome sequencing studies, as well as statistical methods to link personal genetic variation with personal transcriptome variation. During her PhD in Biophysics at Harvard University, she worked in the Department of Biological Engineering at M.I.T. and developed methods using hidden Markov modeling and Bayesian inference to analyze the dynamics of intracellular particles. She previously served as Research Director at the Jain Foundation, focused on the rare genetic disease dysferlinopathy, and held internships at the National Academy of Sciences and the journal Science.
Honors & Awards
NIH K99/R00 Pathway to Independence Award, National Institutes of Health (2017 - present)
NIH NRSA F32 Individual Postdoctoral Fellowship, National Institutes of Health (2014 - 2017)
CEHG Postdoctoral Fellowship, Stanford University (2013 - 2014)
NSF Graduate Research Fellowship, National Science Foundation (2008 - 2010)
Education & Certifications
Ph.D., Harvard University, Biophysics (2013)
M.Phil., University of Cambridge, Chemistry (2005)
B.A., Harvard College, Biochemical Sciences, summa cum laude (2004)
Gene expression imputation identifies candidate genes and susceptibility loci associated with cutaneous squamous cell carcinoma.
2018; 9 (1): 4264
Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) using imputed gene expression levels can identify additional gene-levelassociations. Here we impute gene expression levels in 6891cSCC cases and 54,566controls in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and 25,558self-reported cSCC cases and 673,788controls from 23andMe. In a discovery-validation study, we identify 19 loci containing 33 genes whose imputed expression levels are associated with cSCC at false discovery rate<10% in the GERA cohort and validate 15 of these candidate genes at Bonferroni significance in the 23andMe dataset, including eight genes in five novel susceptibility loci and seven genes in four previously associated loci. These results suggest genetic mechanisms contributing to cSCC risk and illustrate advantages and disadvantages of TWAS as a supplement to traditional GWAS analyses.
View details for DOI 10.1038/s41467-018-06149-6
View details for PubMedID 30323283
FIRE: functional inference of genetic variants that regulate gene expression.
Bioinformatics (Oxford, England)
2017; 33 (24): 3895–3901
Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies.We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types.FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://email@example.com.Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btx534
View details for PubMedID 28961785
REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants
AMERICAN JOURNAL OF HUMAN GENETICS
2016; 99 (4): 877-885
The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10(-12)) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.
View details for DOI 10.1016/j.ajhg.2016.08.016
View details for PubMedID 27666373
Inferring transient particle transport dynamics in live cells
2015; 12 (9): 838-?
Live-cell imaging and particle tracking provide rich information on mechanisms of intracellular transport. However, trajectory analysis procedures to infer complex transport dynamics involving stochastic switching between active transport and diffusive motion are lacking. We applied Bayesian model selection to hidden Markov modeling to infer transient transport states from trajectories of mRNA-protein complexes in live mouse hippocampal neurons and metaphase kinetochores in dividing human cells. The software is available at http://hmm-bayes.org/.
View details for DOI 10.1038/NMETH.3483
View details for Web of Science ID 000360586700028
View details for PubMedID 26192083
Bayesian Approach to MSD-Based Analysis of Particle Motion in Live Cells
2012; 103 (3): 616-626
Quantitative tracking of particle motion using live-cell imaging is a powerful approach to understanding the mechanism of transport of biological molecules, organelles, and cells. However, inferring complex stochastic motion models from single-particle trajectories in an objective manner is nontrivial due to noise from sampling limitations and biological heterogeneity. Here, we present a systematic Bayesian approach to multiple-hypothesis testing of a general set of competing motion models based on particle mean-square displacements that automatically classifies particle motion, properly accounting for sampling limitations and correlated noise while appropriately penalizing model complexity according to Occam's Razor to avoid over-fitting. We test the procedure rigorously using simulated trajectories for which the underlying physical process is known, demonstrating that it chooses the simplest physical model that explains the observed data. Further, we show that computed model probabilities provide a reliability test for the downstream biological interpretation of associated parameter values. We subsequently illustrate the broad utility of the approach by applying it to disparate biological systems including experimental particle trajectories from chromosomes, kinetochores, and membrane receptors undergoing a variety of complex motions. This automated and objective Bayesian framework easily scales to large numbers of particle trajectories, making it ideal for classifying the complex motion of large numbers of single molecules and cells from high-throughput screens, as well as single-cell-, tissue-, and organism-level studies.
View details for DOI 10.1016/j.bpj.2012.06.029
View details for Web of Science ID 000307427700028
View details for PubMedID 22947879
Intracellular Transport by an Anchored Homogeneously Contracting F-Actin Meshwork
2011; 21 (7): 606-611
Actin-based contractility orchestrates changes in cell shape underlying cellular functions ranging from division to migration and wound healing. Actin also functions in intracellular transport, with the prevailing view that filamentous actin (F-actin) cables serve as tracks for motor-driven transport of cargo. We recently discovered an alternate mode of intracellular transport in starfish oocytes involving a contractile F-actin meshwork that mediates chromosome congression. The mechanisms by which this meshwork contracts and translates its contractile activity into directional transport of chromosomes remained open questions. Here, we use live-cell imaging with quantitative analysis of chromosome trajectories and meshwork velocities to show that the 3D F-actin meshwork contracts homogeneously and isotropically throughout the nuclear space. Centrifugation experiments reveal that this homogeneous contraction is translated into asymmetric, directional transport by mechanical anchoring of the meshwork to the cell cortex. Finally, by injecting inert particles of different sizes, we show that this directional transport activity is size-selective and transduced to chromosomal cargo at least in part by steric trapping or "sieving." Taken together, these results reveal mechanistic design principles of a novel and potentially versatile mode of intracellular transport based on sieving by an anchored homogeneously contracting F-actin meshwork.
View details for DOI 10.1016/j.cub.2011.03.002
View details for Web of Science ID 000289662600025
View details for PubMedID 21439825
A Prediction Tool to Facilitate Risk-Stratified Screening for SquamousCellSkin Cancer.
The Journal of investigative dermatology
2018; 138 (12): 2589–94
Cutaneous squamous cell cancers (cSCCs) present an under-recognized health issue among non-Hispanic whites, one that is likely to increase as populations age. cSCC risks vary considerably among non-Hispanic whites, and this heterogeneity indicates the need for risk-stratified screening strategies that are guided by patients' personal characteristics and clinical histories. Here we describe cSCCscore, a prediction tool that uses patients' covariates and clinical histories to assign them personal probabilities of developing cSCCs within 3 years after risk assessment. cSCCscore uses a statistical model for the occurrence and timing of a patient's cSCCs, whose parameters we estimated using cohort data from 66,995 patients in the Kaiser Permanente Northern California healthcare system. We found that patients' covariates and histories explained approximately 75% of their interpersonal cSCC risk variation. Using cross-validated performance measures, we also found cSCCscore's predictions to be moderately well calibrated to the patients' observed cSCC incidence. Moreover, cSCCscore discriminated well between patients who subsequently did and did not develop a new primary cSCC within 3 years after risk assignment, with area under the receiver operating characteristic curve of approximately 85%. Thus, cSCCscore can facilitate more informed management of non-Hispanic white patients at cSCC risk. cSCCscore's predictions are available at https://researchapps.github.io/cSCCscore/.
View details for DOI 10.1016/j.jid.2018.03.1528
View details for PubMedID 30472995
Genetic variants in the HLA class II region associated with risk of cutaneous squamous cell carcinoma.
Cancer immunology, immunotherapy : CII
BACKGROUND: The immune system has been implicated in the pathophysiology of cutaneous squamous cell carcinoma (cSCC) as evidenced by the substantially increased risk of cSCC in immunosuppressed individuals. Associations between cSCC risk and single nucleotide polymorphisms (SNPs) in the HLA region have been identified by genome-wide association studies (GWAS). The translation of the associated HLA SNPs to structural amino acids changes in HLA molecules has not been previously elucidated.METHODS: Using data from a GWAS that included 7238 cSCC cases and 56,961 controls of non-Hispanic white ancestry, we imputed classical alleles and corresponding amino acid changes in HLA genes. Logistic regression models were used to examine associations between cSCC risk and genotyped or imputed SNPs, classical HLA alleles, and amino acid changes.RESULTS: Among the genotyped SNPs, cSCC risk was associated with rs28535317 (OR=1.20, p=9.88*10-11) corresponding to an amino-acid change from phenylalanine to leucine at codon 26 of HLA-DRB1 (OR=1.17, p=2.48*10-10). An additional independent association was observed for a threonine to isoleucine change at codon 107 of HLA-DQA1 (OR=1.14, p=2.34*10-9). Among the classical HLA alleles, cSCC was associated with DRB1*01 (OR=1.18, p=5.86*10-10). Conditional analyses revealed additional independent cSCC associations with DQA1*05:01 and DQA1*05:05. Extended haplotype analysis was used to complement the imputed haplotypes, which identified three extended haplotypes in the HLA-DR and HLA-DQ regions.CONCLUSIONS: Associations with specific HLA-DR and -DQ alleles are likely to explain previously observed GWAS signals in the HLA region associated with cSCC risk.
View details for DOI 10.1007/s00262-018-2168-2
View details for PubMedID 29754218
Identification of Susceptibility Loci for Cutaneous Squamous Cell Carcinoma
JOURNAL OF INVESTIGATIVE DERMATOLOGY
2016; 136 (5): 930-937
We report a genome-wide association study of cutaneous squamous cell carcinoma conducted among non-Hispanic white members of the Kaiser Permanente Northern California health care system. The study includes a genome-wide screen of 61,457 members (6,891 cases and 54,566 controls) genotyped on the Affymetrix Axiom European array and a replication phase involving an independent set of 6,410 additional members (810 cases and 5,600 controls). Combined analysis of screening and replication phases identified 10 loci containing single-nucleotide polymorphisms (SNPs) with P-values < 5 × 10(-8). Six loci contain genes in the pigmentation pathway; SNPs at these loci appear to modulate squamous cell carcinoma risk independently of the pigmentation phenotypes. Another locus contains HLA class II genes studied in relation to elevated squamous cell carcinoma risk following immunosuppression. SNPs at the remaining three loci include an intronic SNP in FOXP1 at locus 3p13, an intergenic SNP at 3q28 near TP63, and an intergenic SNP at 9p22 near BNC2. These findings provide insights into the genetic factors accounting for inherited squamous cell carcinoma susceptibility.
View details for DOI 10.1016/j.jid.2016.01.013
View details for Web of Science ID 000375980600013
View details for PubMedID 26829030
View details for PubMedCentralID PMC4842155
Mapping translation 'hot-spots' in live cells by tracking single molecules of mRNA and ribosomes
Messenger RNA localization is important for cell motility by local protein translation. However, while single mRNAs can be imaged and their movements tracked in single cells, it has not yet been possible to determine whether these mRNAs are actively translating. Therefore, we imaged single β-actin mRNAs tagged with MS2 stem loops colocalizing with labeled ribosomes to determine when polysomes formed. A dataset of tracking information consisting of thousands of trajectories per cell demonstrated that mRNAs co-moving with ribosomes have significantly different diffusion properties from non-translating mRNAs that were exposed to translation inhibitors. These data indicate that ribosome load changes mRNA movement and therefore highly translating mRNAs move slower. Importantly, β-actin mRNA near focal adhesions exhibited sub-diffusive corralled movement characteristic of increased translation. This method can identify where ribosomes become engaged for local protein production and how spatial regulation of mRNA-protein interactions mediates cell directionality.
View details for DOI 10.7554/eLife.10415
View details for Web of Science ID 000373789000001
View details for PubMedID 26760529
An Arp2/3 Nucleated F-Actin Shell Fragments Nuclear Membranes at Nuclear Envelope Breakdown in Starfish Oocytes
2014; 24 (12): 1421-1428
Animal cells disassemble and reassemble their nuclear envelopes (NEs) upon each division. Nuclear envelope breakdown (NEBD) serves as a major regulatory mechanism by which mixing of cytoplasmic and nuclear compartments drives the complete reorganization of cellular architecture, committing the cell for division. Breakdown is initiated by phosphorylation-driven partial disassembly of the nuclear pore complexes (NPCs), increasing their permeability but leaving the overall NE structure intact. Subsequently, the NE is rapidly broken into membrane fragments, defining the transition from prophase to prometaphase and resulting in complete mixing of cyto- and nucleoplasm. However, the mechanism underlying this rapid NE fragmentation remains largely unknown. Here, we show that NE fragmentation during NEBD in starfish oocytes is driven by an Arp2/3 complex-nucleated F-actin "shell" that transiently polymerizes on the inner surface of the NE. Blocking the formation of this F-actin shell prevents membrane fragmentation and delays entry of large cytoplasmic molecules into the nucleus. We observe spike-like protrusions extending from the F-actin shell that appear to "pierce" the NE during the fragmentation process. Finally, we show that NE fragmentation is essential for successful reproduction, because blocking this process in meiosis leads to formation of aneuploid eggs.
View details for DOI 10.1016/j.cub.2014.05.019
View details for Web of Science ID 000337648200034
View details for PubMedID 24909322
The Kinetochore-Bound Ska1 Complex Tracks Depolymerizing Microtubules and Binds to Curved Protofilaments
2012; 23 (5): 968-980
To ensure equal chromosome segregation during mitosis, the macromolecular kinetochore must remain attached to depolymerizing microtubules, which drive chromosome movements. How kinetochores associate with depolymerizing microtubules, which undergo dramatic structural changes forming curved protofilaments, has yet to be defined in vertebrates. Here, we demonstrate that the conserved kinetochore-localized Ska1 complex tracks with depolymerizing microtubule ends and associates with both the microtubule lattice and curved protofilaments. In contrast, the Ndc80 complex, a central player in the kinetochore-microtubule interface, binds only to the straight microtubule lattice and lacks tracking activity. We demonstrate that the Ska1 complex imparts its tracking capability to the Ndc80 complex. Finally, we present a structure of the Ska1 microtubule-binding domain that reveals its interaction with microtubules and its regulation by Aurora B. This work defines an integrated kinetochore-microtubule interface formed by the Ska1 and Ndc80 complexes that associates with depolymerizing microtubules, potentially by interacting with curved microtubule protofilaments.
View details for DOI 10.1016/j.devcel.2012.09.012
View details for Web of Science ID 000311134100013
View details for PubMedID 23085020
Bayesian Approach to the Analysis of Fluorescence Correlation Spectroscopy Data II: Application to Simulated and In Vitro Data
2012; 84 (9): 3880-3888
Fluorescence correlation spectroscopy (FCS) is a powerful approach to characterizing the binding and transport dynamics of macromolecules. The unbiased interpretation of FCS data relies on the evaluation of multiple competing hypotheses to describe an underlying physical process under study, which is typically unknown a priori. Bayesian inference provides a convenient framework for this evaluation based on the temporal autocorrelation function (TACF), as previously shown theoretically using model TACF curves (He, J., Guo, S., and Bathe, M. Anal. Chem. 2012, 84). Here, we apply this procedure to simulated and experimentally measured photon-count traces analyzed using a multitau correlator, which results in complex noise properties in TACF curves that cannot be modeled easily. As a critical component of our technique, we develop two means of estimating the noise in TACF curves based either on multiple independent TACF curves themselves or a single raw underlying intensity trace, including a general procedure to ensure that independent, uncorrelated samples are used in the latter approach. Using these noise definitions, we demonstrate that the Bayesian approach selects the simplest hypothesis that describes the FCS data based on sampling and signal limitations, naturally avoiding overfitting. Further, we show that model probabilities computed using the Bayesian approach provide a reliability test for the downstream interpretation of model parameter values estimated from FCS data. Our procedure is generally applicable to FCS and image correlation spectroscopy and therefore provides an important advance in the application of these methods to the quantitative biophysical investigation of complex analytical and biological systems.
View details for DOI 10.1021/ac2034375
View details for Web of Science ID 000303349200005
View details for PubMedID 22455375
UMD-DYSF, a novel locus specific database for the compilation and interactive analysis of mutations in the dysferlin gene
2012; 33 (3): E2317-E2331
Mutations in the dysferlin gene (DYSF) lead to a complete or partial absence of the dysferlin protein in skeletal muscles and are at the origin of dysferlinopathies, a heterogeneous group of rare autosomal recessive inherited neuromuscular disorders. As a step towards a better understanding of the DYSF mutational spectrum, and towards possible inclusion of patients in future therapeutic clinical trials, we set up the Universal Mutation Database for Dysferlin (UMD-DYSF), a Locus-Specific Database developed with the UMD® software. The main objective of UMD-DYSF is to provide an updated compilation of mutational data and relevant interactive tools for the analysis of DYSF sequence variants, for diagnostic and research purposes. In particular, specific algorithms can facilitate the interpretation of newly identified intronic, missense- or isosemantic-exonic sequence variants, a problem encountered recurrently during genetic diagnosis in dysferlinopathies. UMD-DYSF v1.0 is freely accessible at www.umd.be/DYSF/. It contains a total of 742 mutational entries corresponding to 266 different disease-causing mutations identified in 558 patients worldwide diagnosed with dysferlinopathy. This article presents for the first time a comprehensive analysis of the dysferlin mutational spectrum based on all compiled DYSF disease-causing mutations reported in the literature to date, and using the main bioinformatics tools offered in UMD-DYSF.
View details for DOI 10.1002/humu.22015
View details for Web of Science ID 000300706000021
View details for PubMedID 22213072
Two-color fluorescence analysis of individual virions determines the distribution of the copy number of proteins in herpes simplex virus particles
2007; 93 (4): 1329-1337
We present a single virion method to determine absolute distributions of copy number in the protein composition of viruses and apply it to herpes simplex virus type 1. Using two-color coincidence fluorescence spectroscopy, we determine the virion-to-virion variability in copy numbers of fluorescently labeled tegument and envelope proteins relative to a capsid protein by analyzing fluorescence intensity ratios for ensembles of individual dual-labeled virions and fitting the resulting histogram of ratios. Using EYFP-tagged capsid protein VP26 as a reference for fluorescence intensity, we are able to calculate the mean and also, for the first time to our knowledge, the variation in numbers of gD, VP16, and VP22 tegument. The measurement of the number of glycoprotein D molecules was in good agreement with independent measurements of average numbers of these glycoproteins in bulk virus preparations, validating the method. The accuracy, straightforward data processing, and high throughput of this technique make it widely applicable to the analysis of the molecular composition of large complexes in general, and it is particularly suited to providing insights into virus structure, assembly, and infectivity.
View details for DOI 10.1529/biophysj.107.106351
View details for Web of Science ID 000248208800026
View details for PubMedID 17513380
Unique resistance of I/LnJ mice to a retrovirus is due to sustained interferon gamma-dependent production of virus-neutralizing antibodies
JOURNAL OF EXPERIMENTAL MEDICINE
2003; 197 (2): 233-243
Selection of immune escape variants impairs the ability of the immune system to sustain an efficient antiviral response and to control retroviral infections. Like other retroviruses, mouse mammary tumor virus (MMTV) is not efficiently eliminated by the immune system of susceptible mice. In contrast, MMTV-infected I/LnJ mice are capable of producing IgG2a virus-neutralizing antibodies, sustain this response throughout their life, and secrete antibody-coated virions into the milk, thereby preventing infection of their progeny. Antibodies were produced in response to several MMTV variants and were cross-reactive to them. Resistance to MMTV infection was recessive and was dependent on interferon (IFN)-gamma production, because I/LnJ mice with targeted deletion of the INF-gamma gene failed to produce any virus-neutralizing antibodies. These findings reveal a novel mechanism of resistance to retroviral infection that is based on a robust and sustained IFN-gamma-dependent humoral immune response.
View details for DOI 10.1084/jem.20021499
View details for Web of Science ID 000180688900010
View details for PubMedID 12538662