Honors & Awards

  • Kwanjeong Educational Foundation Scholarship, Kwanjeong Educational Foundation (2017 - 2022)

Education & Certifications

  • Master of Science, Stanford University, BIOE-MS (2020)
  • B.S., Seoul National University, Chemical and Biological Engineering, Biological Sciences, Computer Science and Engineering (2017)

Lab Affiliations

All Publications

  • Cross-evaluation of E. coli's operon structures via a whole-cell model suggests alternative cellular benefits for low- versus high-expressing operons. Cell systems Sun, G., DeFelice, M. M., Gillies, T. E., Ahn-Horst, T. A., Andrews, C. J., Krummenacker, M., Karp, P. D., Morrison, J. H., Covert, M. W. 2024


    Many bacteria use operons to coregulate genes, but it remains unclear how operons benefit bacteria. We integrated E. coli's 788 polycistronic operons and 1,231 transcription units into an existing whole-cell model and found inconsistencies between the proposed operon structures and the RNA-seq read counts that the model was parameterized from. We resolved these inconsistencies through iterative, model-guided corrections to both datasets, including the correction of RNA-seq counts of short genes that were misreported as zero by existing alignment algorithms. The resulting model suggested two main modes by which operons benefit bacteria. For 86% of low-expression operons, adding operons increased the co-expression probabilities of their constituent proteins, whereas for 92% of high-expression operons, adding operons resulted in more stable expression ratios between the proteins. These simulations underscored the need for further experimental work on how operons reduce noise and synchronize both the expression timing and the quantity of constituent genes. A record of this paper's transparent peer review process is included in the supplemental information.

    View details for DOI 10.1016/j.cels.2024.02.002

    View details for PubMedID 38417437

  • The EcoCyc Database (2023). EcoSal Plus Karp, P. D., Paley, S., Caspi, R., Kothari, A., Krummenacker, M., Midford, P. E., Moore, L. R., Subhraveti, P., Gama-Castro, S., Tierrafria, V. H., Lara, P., Muñiz-Rascado, L., Bonavides-Martinez, C., Santos-Zavaleta, A., Mackie, A., Sun, G., Ahn-Horst, T. A., Choi, H., Covert, M. W., Collado-Vides, J., Paulsen, I. 2023: eesp00022023


    EcoCyc is a bioinformatics database available online at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed online. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc are also available. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.

    View details for PubMedID 37220074

  • An expanded whole-cell model of E. coli links cellular physiology with mechanisms of growth rate control. NPJ systems biology and applications Ahn-Horst, T. A., Mille, L. S., Sun, G., Morrison, J. H., Covert, M. W. 2022; 8 (1): 30


    Growth and environmental responses are essential for living organisms to survive and adapt to constantly changing environments. In order to simulate new conditions and capture dynamic responses to environmental shifts in a developing whole-cell model of E. coli, we incorporated additional regulation, including dynamics of the global regulator guanosine tetraphosphate (ppGpp), along with dynamics of amino acid biosynthesis and translation. With the model, we show that under perturbed ppGpp conditions, small molecule feedback inhibition pathways, in addition to regulation of expression, play a role in ppGpp regulation of growth. We also found that simulations with dysregulated amino acid synthesis pathways provide average amino acid concentration predictions that are comparable to experimental results but on the single-cell level, concentrations unexpectedly show regular fluctuations. Additionally, during both an upshift and downshift in nutrient availability, the simulated cell responds similarly with a transient increase in the mRNA:rRNA ratio. This additional simulation functionality should support a variety of new applications and expansions of the E. coli Whole-Cell Modeling Project.

    View details for DOI 10.1038/s41540-022-00242-9

    View details for PubMedID 35986058

  • The E. coli Whole-Cell Modeling Project. EcoSal Plus Sun, G., Ahn-Horst, T. A., Covert, M. W. 2021: eESP00012020


    The Escherichia coli whole-cell modeling project seeks to create the most detailed computational model of an E. coli cell in order to better understand and predict the behavior of this model organism. Details about the approach, framework, and current version of the model are discussed. Currently, the model includes the functions of 43% of characterized genes, with ongoing efforts to include additional data and mechanisms. As additional information is incorporated in the model, its utility and predictive power will continue to increase, which means that discovery efforts can be accelerated by community involvement in the generation and inclusion of data. This project will be an invaluable resource to the E. coli community that could be used to verify expected physiological behavior, to predict new outcomes and testable hypotheses for more efficient experimental design iterations, and to evaluate heterogeneous data sets in the context of each other through deep curation.

    View details for DOI 10.1128/ecosalplus.ESP-0001-2020

    View details for PubMedID 34242084

  • Simultaneous cross-evaluation of heterogeneous E. coli datasets via mechanistic simulation. Science (New York, N.Y.) Macklin, D. N., Ahn-Horst, T. A., Choi, H., Ruggero, N. A., Carrera, J., Mason, J. C., Sun, G., Agmon, E., DeFelice, M. M., Maayan, I., Lane, K., Spangler, R. K., Gillies, T. E., Paull, M. L., Akhter, S., Bray, S. R., Weaver, D. S., Keseler, I. M., Karp, P. D., Morrison, J. H., Covert, M. W. 2020; 369 (6502)


    The extensive heterogeneity of biological data poses challenges to analysis and interpretation. Construction of a large-scale mechanistic model of Escherichia coli enabled us to integrate and cross-evaluate a massive, heterogeneous dataset based on measurements reported by various groups over decades. We identified inconsistencies with functional consequences across the data, including that the total output of the ribosomes and RNA polymerases described by data are not sufficient for a cell to reproduce measured doubling times, that measured metabolic parameters are neither fully compatible with each other nor with overall growth, and that essential proteins are absent during the cell cycle-and the cell is robust to this absence. Finally, considering these data as a whole leads to successful predictions of new experimental outcomes, in this case protein half-lives.

    View details for DOI 10.1126/science.aav3751

    View details for PubMedID 32703847

  • BeReTa: a systematic method for identifying target transcriptional regulators to enhance microbial production of chemicals BIOINFORMATICS Kim, M., Sun, G., Lee, D., Kim, B. 2017; 33 (1): 87–94


    Modulation of regulatory circuits governing the metabolic processes is a crucial step for developing microbial cell factories. Despite the prevalence of in silico strain design algorithms, most of them are not capable of predicting required modifications in regulatory networks. Although a few algorithms may predict relevant targets for transcriptional regulator (TR) manipulations, they have limited reliability and applicability due to their high dependency on the availability of integrated metabolic/regulatory models.We present BeReTa (Beneficial Regulator Targeting), a new algorithm for prioritization of TR manipulation targets, which makes use of unintegrated network models. BeReTa identifies TR manipulation targets by evaluating regulatory strengths of interactions and beneficial effects of reactions, and subsequently assigning beneficial scores for the TRs. We demonstrate that BeReTa can predict both known and novel TR manipulation targets for enhanced production of various chemicals in Escherichia coli Furthermore, through a case study of antibiotics production in Streptomyces coelicolor, we successfully demonstrate its wide applicability to even less-studied organisms. To the best of our knowledge, BeReTa is the first strain design algorithm exclusively designed for predicting TR manipulation targets.MATLAB code is available at https://github.com/kms1041/BeReTa (github).byungkim@snu.ac.krSupplementary information: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btw557

    View details for Web of Science ID 000397091600012

    View details for PubMedID 27605107