Gregory Valiant
Associate Professor of Computer Science
Academic Appointments
-
Associate Professor, Computer Science
-
Member, Bio-X
-
Faculty Affiliate, Institute for Human-Centered Artificial Intelligence (HAI)
Honors & Awards
-
Young Investigator Award, Office of Naval Research (2018)
-
Hellman Faculty Scholar, The Hellman Fellows Program (2017)
-
Sloan Foundation Fellowship, The Sloan Foundation (2016)
-
NSF CAREER Award, National Science Foundation (2014)
-
ACM Doctoral Dissertation Award, Honorable Mention, Association for Computing Machinery (ACM) (2012)
Professional Education
-
PhD, UC Berkeley, Computer Science (2012)
-
BA, Harvard University, Mathematics (2006)
Current Research and Scholarly Interests
My primary research interests lie at the intersection of algorithms, learning, applied probability, and statistics. I am particularly interested in understanding the algorithmic and information theoretic possibilities and limitations for many fundamental information extraction tasks that underly real-world machine learning and data-centric applications.
2025-26 Courses
-
Independent Studies (13)
- Advanced Reading and Research
CS 499 (Aut, Win, Spr, Sum) - Advanced Reading and Research
CS 499P (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390A (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390B (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390C (Aut, Win, Spr, Sum) - Independent Project
CS 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399P (Aut, Win, Spr, Sum) - Independent Work
CS 199 (Aut, Win, Spr, Sum) - Independent Work
CS 199P (Aut, Win, Spr, Sum) - Part-time Curricular Practical Training
CS 390D (Aut, Win, Spr, Sum) - Senior Project
CS 191 (Aut, Win, Spr) - Supervised Undergraduate Research
CS 195 (Aut, Win, Spr, Sum) - Writing Intensive Senior Research Project
CS 191W (Aut, Win, Spr)
- Advanced Reading and Research
-
Prior Year Courses
2023-24 Courses
- Randomized Algorithms and Probabilistic Analysis
CME 309, CS 265 (Aut) - The Modern Algorithmic Toolbox
CS 168 (Spr)
2022-23 Courses
- The Modern Algorithmic Toolbox
CS 168 (Spr)
- Randomized Algorithms and Probabilistic Analysis
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Guy Blanc -
Orals Evaluator
Roshni Sahoo -
Doctoral Dissertation Co-Advisor (AC)
Steven Cao, Chirag Pabbaraju, Aidan Perreault, Roshni Sahoo -
Master's Program Advisor
Arjun Jain -
Doctoral (Program)
Spencer Compton, Mika Jain
All Publications
- Memory-Sample Tradeoffs for Linear Regression with Small Error Symposium on Theory of Computing (STOC) 2019
-
Estimating the Unseen: Improved Estimators for Entropy and Other Properties
JOURNAL OF THE ACM
2017; 64 (6)
View details for DOI 10.1145/3125643
View details for Web of Science ID 000418048800001
-
SPECTRUM ESTIMATION FROM SAMPLES
ANNALS OF STATISTICS
2017; 45 (5): 2218–47
View details for DOI 10.1214/16-AOS1525
View details for Web of Science ID 000416455300014
-
AN AUTOMATIC INEQUALITY PROVER AND INSTANCE OPTIMAL IDENTITY TESTING
SIAM JOURNAL ON COMPUTING
2017; 46 (1): 429-455
View details for DOI 10.1137/151002526
View details for Web of Science ID 000396677400016
-
Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects.
Nature communications
2016; 7: 13293-?
Abstract
As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.
View details for DOI 10.1038/ncomms13293
View details for PubMedID 27796292
View details for PubMedCentralID PMC5095512
-
Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem
JOURNAL OF THE ACM
2015; 62 (2)
View details for DOI 10.1145/2728167
View details for Web of Science ID 000354799200004
- Learning from Untrusted Data Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC) 2017
-
Adaptive and Oblivious Statistical Adversaries Are Equivalent
edited by Koucky, M., Bansal, N.
ASSOC COMPUTING MACHINERY. 2025: 2031-2042
View details for DOI 10.1145/3717823.3718193
View details for Web of Science ID 001595410700188
-
Generalized Trace Reconstruction Problem: Recovering a String of Probabilities
edited by Koucky, M., Bansal, N.
ASSOC COMPUTING MACHINERY. 2025: 1657-1667
View details for DOI 10.1145/3717823.3718315
View details for Web of Science ID 001595410700154
-
ON THE STATISTICAL COMPLEXITY OF SAMPLE AMPLIFICATION
ANNALS OF STATISTICS
2024; 52 (6): 2767-2790
View details for DOI 10.1214/24-AOS2444
View details for Web of Science ID 001381606500012
-
Efficient Convex Optimization Requires Superlinear Memory
JOURNAL OF THE ACM
2024; 71 (6)
View details for DOI 10.1145/3689208
View details for Web of Science ID 001391326700006
-
Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances
edited by Mohar, B., Shinkar, O'Donnell, R.
ASSOC COMPUTING MACHINERY. 2024: 194-200
View details for DOI 10.1145/3618260.3649754
View details for Web of Science ID 001254099900020
-
Matrix Multiplication in Quadratic Time and Energy? Towards a Fine-Grained Energy-Centric Church-Turing Thesis
edited by Guruswami
SCHLOSS DAGSTUHL, LEIBNIZ CENTER INFORMATICS. 2024
View details for DOI 10.4230/LIPIcs.ITCS.2024.96
View details for Web of Science ID 001300389400096
-
Efficient Convex Optimization Requires Superlinear Memory (Extended Abstract)
edited by Elkind, E.
IJCAI-INT JOINT CONF ARTIF INTELL. 2023: 6468-6473
View details for Web of Science ID 001202344206068
-
Lexinvariant Language Models
edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001226352804003
-
One-sided Matrix Completion from Two Observations Per Row
edited by Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001206894303027
-
ReporterSeq reveals genome-wide dynamic modulators of the heat shock response across diverse stressors.
eLife
2021; 10
Abstract
Understanding cellular stress response pathways is challenging because of the complexity of regulatory mechanisms and response dynamics, which can vary with both time and the type of stress. We developed a reverse genetic method called ReporterSeq to comprehensively identify genes regulating a stress-induced transcription factor under multiple conditions in a time-resolved manner. ReporterSeq links RNA-encoded barcode levels to pathway-specific output under genetic perturbations, allowing pooled pathway activity measurements via DNA sequencing alone and without cell enrichment or single-cell isolation. We used ReporterSeq to identify regulators of the heat shock response (HSR), a conserved, poorly understood transcriptional program that protects cells from proteotoxicity and is misregulated in disease. Genome-wide HSR regulation in budding yeast was assessed across 15 stress conditions, uncovering novel stress-specific, time-specific, and constitutive regulators. ReporterSeq can assess the genetic regulators of any transcriptional pathway with the scale of pooled genetic screens and the precision of pathway-specific readouts.
View details for DOI 10.7554/eLife.57376
View details for PubMedID 34223816
-
Misspecification in Prediction Problems and Robustness via Improper Learning
edited by Banerjee, A., Fukumizu, K.
MICROTOME PUBLISHING. 2021
View details for Web of Science ID 000659893802060
-
Stronger Calibration Lower Bounds via Sidestepping
edited by Khuller, S., Williams, V. V.
ASSOC COMPUTING MACHINERY. 2021: 456-466
View details for DOI 10.1145/3406325.3451050
View details for Web of Science ID 000810492500048
-
Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training
edited by Meila, M., Zhang, T.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2021: 7067-7079
View details for Web of Science ID 000768182700007
-
Beyond Laurel/Yanny: An Autoencoder-Enabled Search for Polyperceivable Audio
ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2021: 593-598
View details for Web of Science ID 000694699200075
-
Sample Amplification: Increasing Dataset Size even when Learning is Impossible
edited by Daume, H., Singh, A.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2020
View details for Web of Science ID 000683178500042
-
A Polynomial Time Algorithm for Log-Concave Maximum Likelihood via Locally Exponential Families
edited by Wallach, H., Larochelle, H., Beygelzimer, A., d'Alche-Buc, F., Fox, E., Garnett, R.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019
View details for Web of Science ID 000534424307071
-
Memory-Sample Tradeoffs for Linear Regression with Small Error
edited by Charikar, M., Cohen, E.
ASSOC COMPUTING MACHINERY. 2019: 890–901
View details for DOI 10.1145/3313276.3316403
View details for Web of Science ID 000523199100080
- Maximum Likelihood Estimation for Learning Populations of Parameters ICML 2019
-
Making AI Forget You: Data Deletion in Machine Learning
edited by Wallach, H., Larochelle, H., Beygelzimer, A., d'Alche-Buc, F., Fox, E., Garnett, R.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019
View details for Web of Science ID 000534424303050
-
A Spectral View of Adversarially Robust Features
edited by Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461852004067
-
Approximating the Spectrum of a Graph
ASSOC COMPUTING MACHINERY. 2018: 1263–71
View details for DOI 10.1145/3219819.3220119
View details for Web of Science ID 000455346400131
-
Prediction with a Short Memory
edited by Diakonikolas, Kempe, D., Henzinger, M.
ASSOC COMPUTING MACHINERY. 2018: 1074–87
View details for DOI 10.1145/3188745.3188954
View details for Web of Science ID 000458175600092
-
Estimating Learnability in the Sublinear Data Regime
edited by Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461823305047
-
Sketching Linear Classifiers over Data Streams
edited by Das, G., Jermaine, C., Bernstein, P., Eldawy, A.
ASSOC COMPUTING MACHINERY. 2018: 757–72
View details for DOI 10.1145/3183713.3196930
View details for Web of Science ID 000460373700052
-
Learning from Untrusted Data
edited by Hatami, H., McKenzie, P., King
ASSOC COMPUTING MACHINERY. 2017: 47–60
View details for DOI 10.1145/3055399.3055491
View details for Web of Science ID 000440317600012
-
Learning Populations of Parameters
edited by Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
View details for Web of Science ID 000452649405083
-
Learning Overcomplete HMMs
edited by Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
View details for Web of Science ID 000452649400090
- Learning Sparse Polynomial Functions. 2014
- Optimal Algorithms for Testing Closeness of Discrete Distributions. 2014
- Computation in anonymous networks. CoRR abs/1306.4151 2013
- Instance-by-instance optimal identity testing. Electronic Colloquium on Computational Complexity (ECCC) 2013; 20: 111
- Testing k-Modal Distributions: Optimal Algorithms via Reductions. 2013
- Optimal Algorithms for Testing Closeness of Discrete Distributions. CoRR abs/1308.3946 2013
- Estimating the Unseen: Improved Estimators for Entropy and other Properties. 2013
- Least Squares Revisited: Scalable Approaches for Multi-class Prediction. CoRR abs/1310.1949 2013
- Size and Treewidth Bounds for Conjunctive Queries. J. ACM 2012; 3 (59): 16
- Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas. 2012
- Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas with Noise. Electronic Colloquium on Computational Complexity (ECCC) 2012; 19: 6
- Disentangling Gaussians. Commun. ACM 2012; 2 (55): 113-120
- Beating brute-force: Improved algorithms for finding correlations, and related problems. TinyToCS 2012; 1
- Testing $k$-Modal Distributions: Optimal Algorithms via Reductions. CoRR abs/1112.5659 2011
- Best-Response Mechanisms. 2011
- The Power of Linear Estimators. 2011
- Best-response auctions. 2011
- When is it best to best-respond? SIGecom Exchanges 2011; 2 (10): 16-18
- Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. 2011
- Incentive-compatible distributed greedy protocols. 2011
-
Braess's Paradox in Large Random Graphs
RANDOM STRUCTURES & ALGORITHMS
2010; 37 (4): 495-515
View details for DOI 10.1002/rsa.20325
View details for Web of Science ID 000284038400006
-
DESIGNING NETWORK PROTOCOLS FOR GOOD EQUILIBRIA
SIAM JOURNAL ON COMPUTING
2010; 39 (5): 1799-1832
View details for DOI 10.1137/08072721X
View details for Web of Science ID 000277584700005
- Efficiently learning mixtures of two Gaussians. 2010
- A New Look at Selfish Routing. 2010
- Estimating the unseen: A sublinear-sample canonical estimator of distributions. Electronic Colloquium on Computational Complexity (ECCC) 2010; 17: 180
- Settling the Polynomial Learnability of Mixtures of Gaussians. CoRR abs/1004.4223 2010
- Settling the Polynomial Learnability of Mixtures of Gaussians. 2010
- A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on Computational Complexity (ECCC) 2010; 17: 183
- On Learning Algorithms for Nash Equilibria. 2010
- Size Bounds for Conjunctive Queries with General Functional Dependencies. CoRR abs/0909.2030 2009
- On the complexity of Nash equilibria of action-graph games. 2009
- Size and treewidth bounds for conjunctive queries. 2009
-
Designing Networks with Good Equilibria
19th ACM-SIAM Symposium on Discrete Algorithms
SIAM. 2008: 854–863
View details for Web of Science ID 000281596900094
- On the Complexity of Nash Equilibria of Action-Graph Games. CoRR abs/0802.1604 2008
- Braess's paradox in large random graphs. 2006