Tatsunori Hashimoto's Profile | Stanford Profiles

Academic Appointments

Assistant Professor, Computer Science
Faculty Affiliate, Institute for Human-Centered Artificial Intelligence (HAI)

Contact

Academic
thashim@stanford.edu
University - Faculty Department: Computer Science Position: Asst Professor
- 353 JANE STANFORD WAY
- STANFORD, California 94305

Additional Info

Mail Code: 9020
ORCID:
https://orcid.org/0000-0003-0521-5855

2025-26 Courses

Language Modeling from Scratch
CS 336 (Spr)
Independent Studies (13)
- Advanced Reading and Research
  CS 499 (Aut, Win, Spr, Sum)
- Advanced Reading and Research
  CS 499P (Aut, Win, Spr, Sum)
- Curricular Practical Training
  CS 390A (Aut, Win, Spr, Sum)
- Curricular Practical Training
  CS 390B (Aut, Win, Spr, Sum)
- Curricular Practical Training
  CS 390C (Aut, Win, Spr, Sum)
- Independent Project
  CS 399 (Aut, Win, Spr, Sum)
- Independent Work
  CS 199 (Aut, Win, Spr, Sum)
- Independent Work
  CS 199P (Aut, Win, Spr, Sum)
- Part-time Curricular Practical Training
  CS 390D (Aut, Win, Spr, Sum)
- Ph.D. Research
  CME 400 (Aut, Win, Sum)
- Ph.D. Research Rotation
  CME 391 (Sum)
- Supervised Undergraduate Research
  CS 195 (Aut, Sum)
- Writing Intensive Senior Research Project
  CS 191W (Aut, Win, Spr)
Prior Year Courses
2024-25 Courses
- Language Modeling from Scratch
  CS 336 (Spr)
- Natural Language Processing with Deep Learning
  CS 224N, LINGUIST 284, SYMSYS 195N (Win)
2023-24 Courses
- Language Modeling from Scratch
  CS 336 (Spr)
- Natural Language Processing with Deep Learning
  CS 224N, LINGUIST 284, SYMSYS 195N (Win)
2022-23 Courses
- Advances in Foundation Models
  CS 324 (Win)
- Machine Learning Under Distributional Shifts
  CS 329D (Spr)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Simran Arora, Mert Yuksekgonul
Orals Evaluator
Simran Arora, Tianyi Zhang
Doctoral Dissertation Advisor (AC)
Luke Bailey, Herman Brunborg, Tianyi Zhang
Doctoral Dissertation Co-Advisor (AC)
Neil Band, Saminul Haque, Suhas Kotha, Christopher Mohri, Zitong Yang
Master's Program Advisor
Nomin-Erdene Bayarsaikhan, Eric Chen, James Cheng, Andy Dai, Richard Gu, Jay Gupta, Jenny Jin, Niall Kehoe, Arpandeep Khatua, Aamin Kheir, Samuel Kwok, Yanav Lall, Thu Le, Sally Lee, Tony Lee, Shane Mion, Casey Nguyen, Nishikar Paruchuri, Poonam Sahoo, Laszlo Szilagyi, Xiyuan Wu, Alfred Yu, Felix Zhan, Steven Zhao
Postdoctoral Research Mentor
Meena Jagadeesan, Sung Min Park
Doctoral (Program)
Chenglei Si, Tristan Thrush, Tianyi Zhang

All Publications

Benchmarking Large Language Models for News Summarization TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS Zhang, T., Ladhak, F., Durmus, E., Liang, P., Mckeown, K., Hashimoto, T. B. 2024; 12: 39-57

View details for DOI 10.1162/tacl_a_00632

View details for Web of Science ID 001155346700003
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., Hashimoto, T., IEEE COMPUTER SOC IEEE COMPUTER SOC. 2024: 132-143

View details for DOI 10.1109/SPW63631.2024.00018

View details for Web of Science ID 001267254800013
Accelerating Aggregation Queries on Unstructured Streams of Data PROCEEDINGS OF THE VLDB ENDOWMENT Russo, M., Hashimoto, T., Kang, D., Sun, Y., Zaharia, M. 2023; 16 (11): 2897-2910

View details for DOI 10.14778/3611479.3611496

View details for Web of Science ID 001059181900018
When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization Ladhak, F., Durmus, E., Suzgun, M., Zhang, T., Jurafsky, D., McKeown, K., Hashimoto, T. edited by Vlachos, A., Augenstein ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2023: 3206-3219

View details for Web of Science ID 001181056902011
Likelihood-Based Diffusion Language Models Gulrajani, I., Hashimoto, T. B. edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023

View details for Web of Science ID 001226352804011
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks Nie, A., Zhang, Y., Amdekar, A., Piech, C., Hashimoto, T., Gerstenberg, T. edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023

View details for Web of Science ID 001226352807022
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale Bianchi, F., Kalluri, P., Durmus, E., Ladhak, F., Cheng, M., Nozza, D., Hashimoto, T., Jurafsky, D., Zou, J., Caliskan, A., ASSOC COMPUTING MACHINERY ASSOC COMPUTING MACHINERY. 2023: 1493-1504

View details for DOI 10.1145/3593013.3594095

View details for Web of Science ID 001062819300123
Contrastive Error Attribution for Finetuned Language Models Ladhak, F., Durmus, E., Hashimoto, T. edited by Rogers, A., Boyd-Graber, J., Okazaki, N. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2023: 11482-11498

View details for Web of Science ID 001190962503013
Contrastive Decoding: Open-ended Text Generation as Optimization Li, X., Holtzman, A., Fried, D., Liang, P., Eisner, J., Hashimoto, T., Zettlemoyer, L., Lewis, M. edited by Rogers, A., Boyd-Graber, J., Okazaki, N. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2023: 12286-12312

View details for Web of Science ID 001190962504003
Distributionally Robust Losses for Latent Covariate Mixtures OPERATIONS RESEARCH Duchi, J., Hashimoto, T., Namkoong, H. 2022

View details for DOI 10.1287/opre.2022.2363

View details for Web of Science ID 000851350700001
Spurious Correlations in Reference-Free Evaluation of Text Generation Durmus, E., Ladhak, F., Hashimoto, T., Assoc Computat Linguist ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2022: 1443-1454

View details for Web of Science ID 000828702301038
Identifiability Conditions for Domain Adaptation Gulrajani, I., Hashimoto, T. B., Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2022

View details for Web of Science ID 000922378802047
Jury Learning: Integrating Dissenting Voices into Machine Learning Models Gordon, M. L., Lam, M. S., Park, J., Patel, K., Hancock, J. T., Hashimoto, T., Bernstein, M. S., ACM ASSOC COMPUTING MACHINERY. 2022

View details for DOI 10.1145/3491102.3502004

View details for Web of Science ID 000890212503009
TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data Kang, D., Guibas, J., Bailis, P. D., Hashimoto, T., Zaharia, M., ACM ASSOC COMPUTING MACHINERY. 2022: 1934-1947

View details for DOI 10.1145/3514221.3517897

View details for Web of Science ID 000852705400139
Accelerating Approximate Aggregation Queries with Expensive Predicates Kang, D., Guibas, J., Bailis, P., Hashimoto, T., Sun, Y., Zaharia, M. ASSOC COMPUTING MACHINERY. 2021: 2341-2354

View details for DOI 10.14778/3476249.3476285

View details for Web of Science ID 000742891100035
Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions Demszky, D., Liu, J., Mancenido, Z., Cohen, J., Hill, H., Jurafsky, D., Hashimoto, T., Assoc Computat Linguist ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2021: 1638-1653

View details for Web of Science ID 000698663100130
Natural Language Generation, its Evaluation and Metrics Gehrmann, S., Adewumi, T., Aggarwal, K., Ammanamanchi, P., Anuoluwapo, A., Bosselut, A., Chandu, K., Clinciu, M., Das, D., Dhole, K. D., Du, W., Durmus, E., Gangal, V., Garbacea, C., Hashimoto, T., Hou, Y., Jernite, Y., Jhamtani, H., Ji, Y., Jolly, S., Kale, M., Kumar, D., Ladhak, F., Madaan, A., Maddela, M., Mahajan, K., Mahamood, S., Majumder, B., Martins, P., McMillan-Major, A., Mille, S., van Miltenburg, E., Nadeem, M., Narayan, S., Nikolaev, V., Niyongabo, R., Osei, S., Parikh, A., Perez-Beltrachini, L., Rao, N., Raunak, V., Rodriguez, J., Santhanam, S., Sedoc, J., Sellam, T., Shaikh, S., Shimorina, A., Sobrevilla Cabezudo, M., Strobelt, H., Subramani, N., Xu, W., Yang, D., Yerukola, A., Zhou, J., Dusek, O., Emezue, C., Assoc Computat Linguist ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2021: 96-120

View details for Web of Science ID 000697564200010
Perspectives on ENCODE. Nature ENCODE Project Consortium, Snyder, M. P., Gingeras, T. R., Moore, J. E., Weng, Z., Gerstein, M. B., Ren, B., Hardison, R. C., Stamatoyannopoulos, J. A., Graveley, B. R., Feingold, E. A., Pazin, M. J., Pagan, M., Gilchrist, D. A., Hitz, B. C., Cherry, J. M., Bernstein, B. E., Mendenhall, E. M., Zerbino, D. R., Frankish, A., Flicek, P., Myers, R. M., Abascal, F., Acosta, R., Addleman, N. J., Adrian, J., Afzal, V., Aken, B., Akiyama, J. A., Jammal, O. A., Amrhein, H., Anderson, S. M., Andrews, G. R., Antoshechkin, I., Ardlie, K. G., Armstrong, J., Astley, M., Banerjee, B., Barkal, A. A., Barnes, I. H., Barozzi, I., Barrell, D., Barson, G., Bates, D., Baymuradov, U. K., Bazile, C., Beer, M. A., Beik, S., Bender, M. A., Bennett, R., Bouvrette, L. P., Bernstein, B. E., Berry, A., Bhaskar, A., Bignell, A., Blue, S. M., Bodine, D. M., Boix, C., Boley, N., Borrman, T., Borsari, B., Boyle, A. P., Brandsmeier, L. A., Breschi, A., Bresnick, E. H., Brooks, J. A., Buckley, M., Burge, C. B., Byron, R., Cahill, E., Cai, L., Cao, L., Carty, M., Castanon, R. G., Castillo, A., Chaib, H., Chan, E. T., Chee, D. R., Chee, S., Chen, H., Chen, H., Chen, J., Chen, S., Cherry, J. M., Chhetri, S. B., Choudhary, J. S., Chrast, J., Chung, D., Clarke, D., Cody, N. A., Coppola, C. J., Coursen, J., D'Ippolito, A. M., Dalton, S., Danyko, C., Davidson, C., Davila-Velderrain, J., Davis, C. A., Dekker, J., Deran, A., DeSalvo, G., Despacio-Reyes, G., Dewey, C. N., Dickel, D. E., Diegel, M., Diekhans, M., Dileep, V., Ding, B., Djebali, S., Dobin, A., Dominguez, D., Donaldson, S., Drenkow, J., Dreszer, T. R., Drier, Y., Duff, M. O., Dunn, D., Eastman, C., Ecker, J. R., Edwards, M. D., El-Ali, N., Elhajjajy, S. I., Elkins, K., Emili, A., Epstein, C. B., Evans, R. C., Ezkurdia, I., Fan, K., Farnham, P. J., Farrell, N., Feingold, E. A., Ferreira, A., Fisher-Aylor, K., Fitzgerald, S., Flicek, P., Foo, C. S., Fortier, K., Frankish, A., Freese, P., Fu, S., Fu, X., Fu, Y., Fukuda-Yuzawa, Y., Fulciniti, M., Funnell, A. P., Gabdank, I., Galeev, T., Gao, M., Giron, C. G., Garvin, T. H., Gelboin-Burkhart, C. A., Georgolopoulos, G., Gerstein, M. B., Giardine, B. M., Gifford, D. K., Gilbert, D. M., Gilchrist, D. A., Gillespie, S., Gingeras, T. R., Gong, P., Gonzalez, A., Gonzalez, J. M., Good, P., Goren, A., Gorkin, D. U., Graveley, B. R., Gray, M., Greenblatt, J. F., Griffiths, E., Groudine, M. T., Grubert, F., Gu, M., Guigo, R., Guo, H., Guo, Y., Guo, Y., Gursoy, G., Gutierrez-Arcelus, M., Halow, J., Hardison, R. C., Hardy, M., Hariharan, M., Harmanci, A., Harrington, A., Harrow, J. L., Hashimoto, T. B., Hasz, R. D., Hatan, M., Haugen, E., Hayes, J. E., He, P., He, Y., Heidari, N., Hendrickson, D., Heuston, E. F., Hilton, J. A., Hitz, B. C., Hochman, A., Holgren, C., Hou, L., Hou, S., Hsiao, Y. E., Hsu, S., Huang, H., Hubbard, T. J., Huey, J., Hughes, T. R., Hunt, T., Ibarrientos, S., Issner, R., Iwata, M., Izuogu, O., Jaakkola, T., Jameel, N., Jansen, C., Jiang, L., Jiang, P., Johnson, A., Johnson, R., Jungreis, I., Kadaba, M., Kasowski, M., Kasparian, M., Kato, M., Kaul, R., Kawli, T., Kay, M., Keen, J. C., Keles, S., Keller, C. A., Kelley, D., Kellis, M., Kheradpour, P., Kim, D. S., Kirilusha, A., Klein, R. J., Knoechel, B., Kuan, S., Kulik, M. J., Kumar, S., Kundaje, A., Kutyavin, T., Lagarde, J., Lajoie, B. R., Lambert, N. J., Lazar, J., Lee, A. Y., Lee, D., Lee, E., Lee, J. W., Lee, K., Leslie, C. S., Levy, S., Li, B., Li, H., Li, N., Li, X., Li, Y. I., Li, Y., Li, Y., Li, Y., Lian, J., Libbrecht, M. W., Lin, S., Lin, Y., Liu, D., Liu, J., Liu, P., Liu, T., Liu, X. S., Liu, Y., Liu, Y., Long, M., Lou, S., Loveland, J., Lu, A., Lu, Y., Lecuyer, E., Ma, L., Mackiewicz, M., Mannion, B. J., Mannstadt, M., Manthravadi, D., Marinov, G. K., Martin, F. J., Mattei, E., McCue, K., McEown, M., McVicker, G., Meadows, S. K., Meissner, A., Mendenhall, E. M., Messer, C. L., Meuleman, W., Meyer, C., Miller, S., Milton, M. G., Mishra, T., Moore, D. E., Moore, H. M., Moore, J. E., Moore, S. H., Moran, J., Mortazavi, A., Mudge, J. M., Munshi, N., Murad, R., Myers, R. M., Nandakumar, V., Nandi, P., Narasimha, A. M., Narayanan, A. K., Naughton, H., Navarro, F. C., Navas, P., Nazarovs, J., Nelson, J., Neph, S., Neri, F. J., Nery, J. R., Nesmith, A. R., Newberry, J. S., Newberry, K. M., Ngo, V., Nguyen, R., Nguyen, T. B., Nguyen, T., Nishida, A., Noble, W. S., Novak, C. S., Novoa, E. M., Nunez, B., O'Donnell, C. W., Olson, S., Onate, K. C., Otterman, E., Ozadam, H., Pagan, M., Palden, T., Pan, X., Park, Y., Partridge, E. C., Paten, B., Pauli-Behn, F., Pazin, M. J., Pei, B., Pennacchio, L. A., Perez, A. R., Perry, E. H., Pervouchine, D. D., Phalke, N. N., Pham, Q., Phanstiel, D. H., Plajzer-Frick, I., Pratt, G. A., Pratt, H. E., Preissl, S., Pritchard, J. K., Pritykin, Y., Purcaro, M. J., Qin, Q., Quinones-Valdez, G., Rabano, I., Radovani, E., Raj, A., Rajagopal, N., Ram, O., Ramirez, L., Ramirez, R. N., Rausch, D., Raychaudhuri, S., Raymond, J., Razavi, R., Reddy, T. E., Reimonn, T. M., Ren, B., Reymond, A., Reynolds, A., Rhie, S. K., Rinn, J., Rivera, M., Rivera-Mulia, J. C., Roberts, B., Rodriguez, J. M., Rozowsky, J., Ryan, R., Rynes, E., Salins, D. N., Sandstrom, R., Sasaki, T., Sathe, S., Savic, D., Scavelli, A., Scheiman, J., Schlaffner, C., Schloss, J. A., Schmitges, F. W., See, L. H., Sethi, A., Setty, M., Shafer, A., Shan, S., Sharon, E., Shen, Q., Shen, Y., Sherwood, R. I., Shi, M., Shin, S., Shoresh, N., Siebenthall, K., Sisu, C., Slifer, T., Sloan, C. A., Smith, A., Snetkova, V., Snyder, M. P., Spacek, D. V., Srinivasan, S., Srivas, R., Stamatoyannopoulos, G., Stamatoyannopoulos, J. A., Stanton, R., Steffan, D., Stehling-Sun, S., Strattan, J. S., Su, A., Sundararaman, B., Suner, M., Syed, T., Szynkarek, M., Tanaka, F. Y., Tenen, D., Teng, M., Thomas, J. A., Toffey, D., Tress, M. L., Trout, D. E., Trynka, G., Tsuji, J., Upchurch, S. A., Ursu, O., Uszczynska-Ratajczak, B., Uziel, M. C., Valencia, A., Biber, B. V., van der Velde, A. G., Van Nostrand, E. L., Vaydylevich, Y., Vazquez, J., Victorsen, A., Vielmetter, J., Vierstra, J., Visel, A., Vlasova, A., Vockley, C. M., Volpi, S., Vong, S., Wang, H., Wang, M., Wang, Q., Wang, R., Wang, T., Wang, W., Wang, X., Wang, Y., Watson, N. K., Wei, X., Wei, Z., Weisser, H., Weissman, S. M., Welch, R., Welikson, R. E., Weng, Z., Westra, H., Whitaker, J. W., White, C., White, K. P., Wildberg, A., Williams, B. A., Wine, D., Witt, H. N., Wold, B., Wolf, M., Wright, J., Xiao, R., Xiao, X., Xu, J., Xu, J., Yan, K., Yan, Y., Yang, H., Yang, X., Yang, Y., Yardimci, G. G., Yee, B. A., Yeo, G. W., Young, T., Yu, T., Yue, F., Zaleski, C., Zang, C., Zeng, H., Zeng, W., Zerbino, D. R., Zhai, J., Zhan, L., Zhan, Y., Zhang, B., Zhang, J., Zhang, J., Zhang, K., Zhang, L., Zhang, P., Zhang, Q., Zhang, X., Zhang, Y., Zhang, Z., Zhao, Y., Zheng, Y., Zhong, G., Zhou, X., Zhu, Y., Zimmerman, J. 2020; 583 (7818): 693–98

Abstract

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.

View details for DOI 10.1038/s41586-020-2449-8

View details for PubMedID 32728248
Approximate Selection with Guarantees using Proxies PROCEEDINGS OF THE VLDB ENDOWMENT Kang, D., Gan, E., Bailis, P., Hashimoto, T., Zaharia, M. 2020; 13 (11): 1990–2003

View details for DOI 10.14778/3407790.3407804

View details for Web of Science ID 000573965600014
Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature Moore, J. E., Purcaro, M. J., Pratt, H. E., Epstein, C. B., Shoresh, N. n., Adrian, J. n., Kawli, T. n., Davis, C. A., Dobin, A. n., Kaul, R. n., Halow, J. n., Van Nostrand, E. L., Freese, P. n., Gorkin, D. U., Shen, Y. n., He, Y. n., Mackiewicz, M. n., Pauli-Behn, F. n., Williams, B. A., Mortazavi, A. n., Keller, C. A., Zhang, X. O., Elhajjajy, S. I., Huey, J. n., Dickel, D. E., Snetkova, V. n., Wei, X. n., Wang, X. n., Rivera-Mulia, J. C., Rozowsky, J. n., Zhang, J. n., Chhetri, S. B., Zhang, J. n., Victorsen, A. n., White, K. P., Visel, A. n., Yeo, G. W., Burge, C. B., Lécuyer, E. n., Gilbert, D. M., Dekker, J. n., Rinn, J. n., Mendenhall, E. M., Ecker, J. R., Kellis, M. n., Klein, R. J., Noble, W. S., Kundaje, A. n., Guigó, R. n., Farnham, P. J., Cherry, J. M., Myers, R. M., Ren, B. n., Graveley, B. R., Gerstein, M. B., Pennacchio, L. A., Snyder, M. P., Bernstein, B. E., Wold, B. n., Hardison, R. C., Gingeras, T. R., Stamatoyannopoulos, J. A., Weng, Z. n. 2020; 583 (7818): 699–710

Abstract

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.

View details for DOI 10.1038/s41586-020-2493-4

View details for PubMedID 32728249
Improved Natural Language Generation via Loss Truncation Kang, D., Hashimoto, T. B., Assoc Computat Linguist ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2020: 718–31

View details for Web of Science ID 000570978201001
Inferring Multidimensional Rates of Aging from Cross-Sectional Data. Proceedings of machine learning research Pierson, E., Koh, P. W., Hashimoto, T., Koller, D., Leskovec, J., Eriksson, N., Liang, P. 2019; 89: 97–107

Abstract

Modeling how individuals evolve over time is a fundamental problem in the natural and social sciences. However, existing datasets are often cross-sectional with each individual observed only once, making it impossible to apply traditional time-series methods. Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data. Our model represents each individual's features over time as a nonlinear function of a low-dimensional, linearly-evolving latent state. We prove that when this nonlinear function is constrained to be order-isomorphic, the model family is identifiable solely from cross-sectional data provided the distribution of time-independent variation is known. On the UK Biobank human health dataset, our model reconstructs the observed data while learning interpretable rates of aging associated with diseases, mortality, and aging risk factors.

View details for PubMedID 31538144
Inferring Multidimensional Rates of Aging from Cross-Sectional Data Pierson, E., Koh, P., Hashimoto, T., Koller, D., Leskovec, J., Eriksson, N., Liang, P. edited by Chaudhuri, K., Sugiyama, M. MICROTOME PUBLISHING. 2019: 97–107

View details for Web of Science ID 000509687900011
A Retrieve-and-Edit Framework for Predicting Structured Outputs Hashimoto, T. B., Guu, K., Oren, Y., Liang, P. edited by Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018

View details for Web of Science ID 000461852004059
DNase-capture reveals differential transcription factor binding modalities. PloS one Kang, D., Sherwood, R., Barkal, A., Hashimoto, T., Engstrom, L., Gifford, D. 2017; 12 (12): e0187046

Abstract

We describe DNase-capture, an assay that increases the analytical resolution of DNase-seq by focusing its sequencing phase on selected genomic regions. We introduce a new method to compensate for capture bias called BaseNormal that allows for accurate recovery of transcription factor protection profiles from DNase-capture data. We show that these normalized data allow for nuanced detection of transcription factor binding heterogeneity with as few as dozens of sites.

View details for DOI 10.1371/journal.pone.0187046

View details for PubMedID 29284001

View details for PubMedCentralID PMC5746236
Unsupervised Transformation Learning via Convex Relaxations Hashimoto, T. B., Duchi, J. C., Liang, P. edited by Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017

View details for Web of Science ID 000452649406090
A synergistic DNA logic predicts genome-wide chromatin accessibility. Genome research Hashimoto, T., Sherwood, R. I., Kang, D. D., Rajagopal, N., Barkal, A. A., Zeng, H., Emons, B. J., Srinivasan, S., Jaakkola, T., Gifford, D. K. 2016; 26 (10): 1430-1440

Abstract

Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution. We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.

View details for DOI 10.1101/gr.199778.115

View details for PubMedID 27456004

View details for PubMedCentralID PMC5052050
Cas9 Functionally Opens Chromatin. PloS one Barkal, A. A., Srinivasan, S., Hashimoto, T., Gifford, D. K., Sherwood, R. I. 2016; 11 (3): e0152683

Abstract

Using a nuclease-dead Cas9 mutant, we show that Cas9 reproducibly induces chromatin accessibility at previously inaccessible genomic loci. Cas9 chromatin opening is sufficient to enable adjacent binding and transcriptional activation by the settler transcription factor retinoic acid receptor at previously unbound motifs. Thus, we demonstrate a new use for Cas9 in increasing surrounding chromatin accessibility to alter local transcription factor binding.

View details for DOI 10.1371/journal.pone.0152683

View details for PubMedID 27031353

View details for PubMedCentralID PMC4816323
GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding. Bioinformatics (Oxford, England) Zeng, H., Hashimoto, T., Kang, D. D., Gifford, D. K. 2016; 32 (4): 490-6

Abstract

The majority of disease-associated variants identified in genome-wide association studies reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of genome-wide association studies.We present GERV (generative evaluation of regulatory variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer-based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif and associated co-factor motifs. We show that GERV outperforms existing methods in predicting single-nucleotide polymorphisms associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked single-nucleotide polymorphisms and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis.The implementation of GERV and related data are available at http://gerv.csail.mit.edu/.

View details for DOI 10.1093/bioinformatics/btv565

View details for PubMedID 26476779

View details for PubMedCentralID PMC5860000
Cloning-free CRISPR. Stem cell reports Arbab, M., Srinivasan, S., Hashimoto, T., Geijsen, N., Sherwood, R. I. 2015; 5 (5): 908-917

Abstract

We present self-cloning CRISPR/Cas9 (scCRISPR), a technology that allows for CRISPR/Cas9-mediated genomic mutation and site-specific knockin transgene creation within several hours by circumventing the need to clone a site-specific single-guide RNA (sgRNA) or knockin homology construct for each target locus. We introduce a self-cleaving palindromic sgRNA plasmid and a short double-stranded DNA sequence encoding the desired locus-specific sgRNA into target cells, allowing them to produce a locus-specific sgRNA plasmid through homologous recombination. scCRISPR enables efficient generation of gene knockouts (∼88% mutation rate) at approximately one-sixth the cost of plasmid-based sgRNA construction with only 2 hr of preparation for each targeted site. Additionally, we demonstrate efficient site-specific knockin of GFP transgenes without any plasmid cloning or genome-integrated selection cassette in mouse and human embryonic stem cells (2%-4% knockin rate) through PCR-based addition of short homology arms. scCRISPR substantially lowers the bar on mouse and human transgenesis.

View details for DOI 10.1016/j.stemcr.2015.09.022

View details for PubMedID 26527385

View details for PubMedCentralID PMC4649464
Universal count correction for high-throughput sequencing. PLoS computational biology Hashimoto, T. B., Edwards, M. D., Gifford, D. K. 2014; 10 (3): e1003494

Abstract

We show that existing RNA-seq, DNase-seq, and ChIP-seq data exhibit overdispersed per-base read count distributions that are not matched to existing computational method assumptions. To compensate for this overdispersion we introduce a nonparametric and universal method for processing per-base sequencing read count data called FIXSEQ. We demonstrate that FIXSEQ substantially improves the performance of existing RNA-seq, DNase-seq, and ChIP-seq analysis tools when compared with existing alternatives.

View details for DOI 10.1371/journal.pcbi.1003494

View details for PubMedID 24603409

View details for PubMedCentralID PMC3945112
Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nature biotechnology Sherwood, R. I., Hashimoto, T., O'Donnell, C. W., Lewis, S., Barkal, A. A., van Hoff, J. P., Karun, V., Jaakkola, T., Gifford, D. K. 2014; 32 (2): 171-178

Abstract

We describe protein interaction quantitation (PIQ), a computational method for modeling the magnitude and shape of genome-wide DNase I hypersensitivity profiles to identify transcription factor (TF) binding sites. Through the use of machine-learning techniques, PIQ identified binding sites for >700 TFs from one DNase I hypersensitivity analysis followed by sequencing (DNase-seq) experiment with accuracy comparable to that of chromatin immunoprecipitation followed by sequencing (ChIP-seq). We applied PIQ to analyze DNase-seq data from mouse embryonic stem cells differentiating into prepancreatic and intestinal endoderm. We identified 120 and experimentally validated eight 'pioneer' TF families that dynamically open chromatin. Four pioneer TF families only opened chromatin in one direction from their motifs. Furthermore, we identified 'settler' TFs whose genomic binding is principally governed by proximity to open chromatin. Our results support a model of hierarchical TF binding in which directional and nondirectional pioneer activity shapes the chromatin landscape for population by settler TFs.

View details for DOI 10.1038/nbt.2798

View details for PubMedID 24441470

View details for PubMedCentralID PMC3951735
Lineage-based identification of cellular states and expression programs. Bioinformatics (Oxford, England) Hashimoto, T., Jaakkola, T., Sherwood, R., Mazzoni, E. O., Wichterle, H., Gifford, D. 2012; 28 (12): i250-7

Abstract

We present a method, LineageProgram, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L(1) that based methods controls the parameters in three distinct ways: the number of genes change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization, techniques, such as singular value decomposition and non-negative matrix factorization show that our method provides higher predictive power in held, out tests while inducing sparse and biologically relevant gene sets.

View details for DOI 10.1093/bioinformatics/bts204

View details for PubMedID 22689769

View details for PubMedCentralID PMC3371836

Tatsunori Hashimoto

Assistant Professor of Computer Science

Academic Appointments

Contact

Additional Info

Links

2025-26 Courses

2024-25 Courses

2023-24 Courses

2022-23 Courses

Stanford Advisees

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract