Gregory Valiant
Associate Professor of Computer Science
Academic Appointments
-
Associate Professor, Computer Science
-
Member, Bio-X
-
Faculty Affiliate, Institute for Human-Centered Artificial Intelligence (HAI)
Honors & Awards
-
Young Investigator Award, Office of Naval Research (2018)
-
Hellman Faculty Scholar, The Hellman Fellows Program (2017)
-
Sloan Foundation Fellowship, The Sloan Foundation (2016)
-
NSF CAREER Award, National Science Foundation (2014)
-
ACM Doctoral Dissertation Award, Honorable Mention, Association for Computing Machinery (ACM) (2012)
Professional Education
-
PhD, UC Berkeley, Computer Science (2012)
-
BA, Harvard University, Mathematics (2006)
Current Research and Scholarly Interests
My primary research interests lie at the intersection of algorithms, learning, applied probability, and statistics. I am particularly interested in understanding the algorithmic and information theoretic possibilities and limitations for many fundamental information extraction tasks that underly real-world machine learning and data-centric applications.
2024-25 Courses
- Randomized Algorithms and Probabilistic Analysis
CME 309, CS 265 (Win) - The Modern Algorithmic Toolbox
CS 168 (Spr) -
Independent Studies (13)
- Advanced Reading and Research
CS 499 (Aut, Win, Spr) - Advanced Reading and Research
CS 499P (Aut, Win, Spr) - Curricular Practical Training
CS 390A (Aut, Win, Spr) - Curricular Practical Training
CS 390B (Aut, Win, Spr) - Curricular Practical Training
CS 390C (Aut, Win, Spr) - Independent Project
CS 399 (Aut, Win, Spr) - Independent Project
CS 399P (Aut, Win, Spr) - Independent Work
CS 199 (Aut, Win, Spr) - Independent Work
CS 199P (Aut, Win, Spr) - Part-time Curricular Practical Training
CS 390D (Aut, Win, Spr) - Senior Project
CS 191 (Aut, Win, Spr) - Supervised Undergraduate Research
CS 195 (Aut, Win, Spr) - Writing Intensive Senior Research Project
CS 191W (Aut, Win, Spr)
- Advanced Reading and Research
-
Prior Year Courses
2023-24 Courses
- Randomized Algorithms and Probabilistic Analysis
CME 309, CS 265 (Aut) - The Modern Algorithmic Toolbox
CS 168 (Spr)
2022-23 Courses
- The Modern Algorithmic Toolbox
CS 168 (Spr)
2021-22 Courses
- Randomized Algorithms and Probabilistic Analysis
CME 309, CS 265 (Win) - The Modern Algorithmic Toolbox
CS 168 (Spr)
- Randomized Algorithms and Probabilistic Analysis
Stanford Advisees
-
Doctoral Dissertation Co-Advisor (AC)
Steven Cao, Spencer Compton, Chirag Pabbaraju, Aidan Perreault, Roshni Sahoo -
Master's Program Advisor
Whayden Dhamcho, Rayan Krishnan, Joe Tsai, Diego Valdez Duran, Adrien Wu -
Doctoral (Program)
Vijaykrishna Gurunathan, Mika Jain
All Publications
- Memory-Sample Tradeoffs for Linear Regression with Small Error Symposium on Theory of Computing (STOC) 2019
-
Estimating the Unseen: Improved Estimators for Entropy and Other Properties
JOURNAL OF THE ACM
2017; 64 (6)
View details for DOI 10.1145/3125643
View details for Web of Science ID 000418048800001
-
SPECTRUM ESTIMATION FROM SAMPLES
ANNALS OF STATISTICS
2017; 45 (5): 2218–47
View details for DOI 10.1214/16-AOS1525
View details for Web of Science ID 000416455300014
-
AN AUTOMATIC INEQUALITY PROVER AND INSTANCE OPTIMAL IDENTITY TESTING
SIAM JOURNAL ON COMPUTING
2017; 46 (1): 429-455
View details for DOI 10.1137/151002526
View details for Web of Science ID 000396677400016
-
Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects.
Nature communications
2016; 7: 13293-?
Abstract
As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.
View details for DOI 10.1038/ncomms13293
View details for PubMedID 27796292
View details for PubMedCentralID PMC5095512
-
Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem
JOURNAL OF THE ACM
2015; 62 (2)
View details for DOI 10.1145/2728167
View details for Web of Science ID 000354799200004
- Learning from Untrusted Data Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC) 2017
- Maximum Likelihood Estimation for Learning Populations of Parameters ICML 2019
-
Approximating the Spectrum of a Graph
ASSOC COMPUTING MACHINERY. 2018: 1263–71
View details for DOI 10.1145/3219819.3220119
View details for Web of Science ID 000455346400131
-
A Spectral View of Adversarially Robust Features
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461852004067
-
Learning Overcomplete HMMs
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
View details for Web of Science ID 000452649400090
- Learning Sparse Polynomial Functions. 2014
- Optimal Algorithms for Testing Closeness of Discrete Distributions. 2014
- Least Squares Revisited: Scalable Approaches for Multi-class Prediction. CoRR abs/1310.1949 2013
- Instance-by-instance optimal identity testing. Electronic Colloquium on Computational Complexity (ECCC) 2013; 20: 111
- Testing k-Modal Distributions: Optimal Algorithms via Reductions. 2013
- Estimating the Unseen: Improved Estimators for Entropy and other Properties. 2013
- Computation in anonymous networks. CoRR abs/1306.4151 2013
- Optimal Algorithms for Testing Closeness of Discrete Distributions. CoRR abs/1308.3946 2013
- Size and Treewidth Bounds for Conjunctive Queries. J. ACM 2012; 3 (59): 16
- Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas. 2012
- Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas with Noise. Electronic Colloquium on Computational Complexity (ECCC) 2012; 19: 6
- Disentangling Gaussians. Commun. ACM 2012; 2 (55): 113-120
- Beating brute-force: Improved algorithms for finding correlations, and related problems. TinyToCS 2012; 1
- Testing $k$-Modal Distributions: Optimal Algorithms via Reductions. CoRR abs/1112.5659 2011
- The Power of Linear Estimators. 2011
- Best-response auctions. 2011
- When is it best to best-respond? SIGecom Exchanges 2011; 2 (10): 16-18
- Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. 2011
- Best-Response Mechanisms. 2011
- Incentive-compatible distributed greedy protocols. 2011
-
Braess's Paradox in Large Random Graphs
RANDOM STRUCTURES & ALGORITHMS
2010; 37 (4): 495-515
View details for DOI 10.1002/rsa.20325
View details for Web of Science ID 000284038400006
-
DESIGNING NETWORK PROTOCOLS FOR GOOD EQUILIBRIA
SIAM JOURNAL ON COMPUTING
2010; 39 (5): 1799-1832
View details for DOI 10.1137/08072721X
View details for Web of Science ID 000277584700005
- Estimating the unseen: A sublinear-sample canonical estimator of distributions. Electronic Colloquium on Computational Complexity (ECCC) 2010; 17: 180
- A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on Computational Complexity (ECCC) 2010; 17: 183
- On Learning Algorithms for Nash Equilibria. 2010
- Efficiently learning mixtures of two Gaussians. 2010
- A New Look at Selfish Routing. 2010
- Settling the Polynomial Learnability of Mixtures of Gaussians. CoRR abs/1004.4223 2010
- Settling the Polynomial Learnability of Mixtures of Gaussians. 2010
- Size and treewidth bounds for conjunctive queries. 2009
- Size Bounds for Conjunctive Queries with General Functional Dependencies. CoRR abs/0909.2030 2009
- On the complexity of Nash equilibria of action-graph games. 2009
-
Designing Networks with Good Equilibria
19th ACM-SIAM Symposium on Discrete Algorithms
SIAM. 2008: 854–863
View details for Web of Science ID 000281596900094
- On the Complexity of Nash Equilibria of Action-Graph Games. CoRR abs/0802.1604 2008
- Braess's paradox in large random graphs. 2006