Mert Pilanci's Profile | Stanford Profiles

Bio

Mert Pilanci is an assistant professor of Electrical Engineering at Stanford University. He received his Ph.D. in Electrical Engineering and Computer Science from UC Berkeley in 2016. Prior to joining Stanford, he was an assistant professor of Electrical Engineering and Computer Science at the University of Michigan. In 2017, he was a Math+X postdoctoral fellow working with Emmanuel Candès at Stanford University. His research interests are in large scale machine learning, optimization, and information theory.

Academic Appointments

Assistant Professor, Electrical Engineering

Honors & Awards

CAREER Award, National Science Foundation (2023)
Early Career Award, U.S. Army Research Office (2021)
International Conference on Acoustics, Speech, & Signal Processing (ICASSP) Best Paper Award, IEEE (2021)
Best Poster Award, Conference on the Mathematical Theory of Deep Neural Networks (2020)
Faculty Research Award, Facebook (2020)
Faculty Research Award, Adobe (2019)
Terman Faculty Fellow, Stanford University (2018)
Math+X Postdoctoral Fellowship, Simons Foundation (2016)
PhD Fellowship, Microsoft Research (2013)
Signal Processing and Communications Applications Conference Best Paper Award, IEEE (2010)

Program Affiliations

Stanford SystemX Alliance

Professional Education

Postdoctoral Fellow, Stanford University (2017)
PhD, University of California, Berkeley, Electrical Engineering and Computer Science (2016)

Current Research and Scholarly Interests

Dr. Pilanci's research interests include neural networks, machine learning, mathematical optimization, information theory and signal processing.

2025-26 Courses

Convex Optimization II
CME 364B, EE 364B (Spr)
Introductory Research Seminar in Electrical Engineering
EE 301 (Aut)
Signal Processing and Quantization for Machine Learning
EE 269 (Win)
Independent Studies (7)
- Curricular Practical Training
  CME 390 (Aut, Win, Spr, Sum)
- Directed Studies in Applied Physics
  APPPHYS 290 (Aut, Win, Spr, Sum)
- Master's Thesis and Thesis Research
  EE 300 (Aut, Win, Spr, Sum)
- Ph.D. Research
  CME 400 (Aut, Win, Spr, Sum)
- Ph.D. Research Rotation
  CME 391 (Win)
- Special Studies and Reports in Electrical Engineering
  EE 391 (Aut, Win, Spr, Sum)
- Special Studies or Projects in Electrical Engineering
  EE 390 (Aut, Win, Spr, Sum)
Prior Year Courses
2024-25 Courses
- Convex Optimization II
  CME 364B, EE 364B (Spr)
- Introductory Research Seminar in Electrical Engineering
  EE 301 (Aut)
2023-24 Courses
- Convex Optimization II
  CME 364B, EE 364B (Spr)
- Introductory Research Seminar in Electrical Engineering
  EE 301 (Aut)
- Signal Processing for Machine Learning
  EE 269 (Aut)
2022-23 Courses
- Convex Optimization II
  CME 364B, EE 364B (Spr)
- Introductory Research Seminar in Electrical Engineering
  EE 301 (Aut)
- Signal Processing for Machine Learning
  EE 269 (Win)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Amirhossein Afsharrad, Felipe Areces, Daniel Birger Gunnar Cederberg, Ibrahim Gulluk, Andrei Kanavalau, Ryan Po, Marina Qian, Naomi Sagan, Maximilian Schaller, Irmak Sivgin, Yonatan Urman, Emi Zeger, Orr Zohar
Doctoral Dissertation Advisor (AC)
Calvin Ang, Sungyoon Kim, Fangzhao Zhang
Master's Program Advisor
Shreya Ramanujam, Haowen Wang
Doctoral (Program)
Amirhossein Afsharrad, Mete Erdogan, Dorsa Fathollahi, Miria Feng, Ibrahim Gulluk, Sungyoon Kim

All Publications

Pretraining and the lasso JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY Craig, E., Pilanci, M., Le Menestrel, T., Narasimhan, B., Rivas, M. A., Gullaksen, S., Dehghannasiri, R., Salzman, J., Taylor, J., Tibshirani, R. 2025

View details for DOI 10.1093/jrsssb/qkaf050

View details for Web of Science ID 001547619600001
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models IEEE TRANSACTIONS ON INFORMATION THEORY Ergen, T., Pilanci, M. 2025; 71 (5): 3854-3870

View details for DOI 10.1109/TIT.2025.3545564

View details for Web of Science ID 001476867600028
Overparameterized ReLU Neural Networks Learn the Simplest Model: Neural Isometry and Phase Transitions IEEE TRANSACTIONS ON INFORMATION THEORY Wang, Y., Hua, Y., Candes, E. J., Pilanci, M. 2025; 71 (3): 1926-1977

View details for DOI 10.1109/TIT.2025.3530355

View details for Web of Science ID 001468449300009
Disentangling Speech Representations Learning With Latent Diffusion for Speaker Verification IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING Li, Z., Mak, M., Chien, J., Pilanci, M., Jin, Z., Meng, H. 2025; 33: 3896-3907

View details for DOI 10.1109/TASLPRO.2025.3610023

View details for Web of Science ID 001579024300010
Spectral-Aware Low-Rank Adaptation for Speaker Verification Li, Z., Mak, M., Pilanci, M., Lee, H., Meng, H., IEEE IEEE. 2025

View details for DOI 10.1109/ICASSP49660.2025.10890259

View details for Web of Science ID 001611519700201
Adaptive Large Language Models via Attention Shortcuts Verma, P., Pilanci, M., IEEE IEEE. 2025

View details for DOI 10.1109/ICASSP49660.2025.10889322

View details for Web of Science ID 001611517600099
ConvexECG: Lightweight and Explainable Neural Networks for Personalized, Continuous Cardiac Monitoring Ansari, R., Cao, J., Bandyopadhyay, S., Narayan, S. M., Rogers, A. J., Pilanci, M., IEEE IEEE. 2025

View details for DOI 10.1109/ICASSP49660.2025.10889573

View details for Web of Science ID 001611517600350
Disentangling Speaker and Content in Pre-trained Speech Models with Latent Diffusion for Robust Speaker Verification Li, Z., Mak, M., Chien, J., Pilanci, M., Fin, Z., Meng, H., Int Speech Commun Assoc ISCA-INT SPEECH COMMUNICATION ASSOC. 2025: 1108-1112

View details for DOI 10.21437/Interspeech.2025-1865

View details for Web of Science ID 001585350500227
Subtractive Training for Music Stem Insertion Using Latent Diffusion Models Villa-Renteria, I., Wang, M., Shah, Z., Li, Z., Kim, S., Ramachandran, N., Pilanci, M., IEEE IEEE. 2025

View details for DOI 10.1109/ICASSP49660.2025.10887744

View details for Web of Science ID 001548470300185
Mutual Information-Enhanced Contrastive Learning With Margin for Maximal Speaker Separability IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING Li, Z., Mak, M., Pilanci, M., Meng, H. 2025; 33: 2961-2972

View details for DOI 10.1109/TASLPRO.2025.3583485

View details for Web of Science ID 001534498300001
Neural spectrahedra and semidefinite lifts: global convex optimization of degree-two polynomial activation neural networks in polynomial-time MATHEMATICAL PROGRAMMING Bartan, B., Pilanci, M. 2024

View details for DOI 10.1007/s10107-024-02153-5

View details for Web of Science ID 001350178400001
Gradient Coding With Iterative Block Leverage Score Sampling IEEE TRANSACTIONS ON INFORMATION THEORY Charalambides, N., Pilanci, M., Hero, A. O. 2024; 70 (9): 6639-6664

View details for DOI 10.1109/TIT.2024.3420222

View details for Web of Science ID 001299623600023
Optimal Neural Network Approximation of Wasserstein Gradient Direction via Convex Optimization\ast SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE Wang, Y., Chen, P., Pilanci, M., Li, W. 2024; 6 (4): 978-999

View details for DOI 10.1137/23M1573173

View details for Web of Science ID 001343415400006
Iterative Sketching for Secure Coded Regression IEEE JOURNAL ON SELECTED AREAS IN INFORMATION THEORY Charalambides, N., Mahdavifar, H., Pilanci, M., Hero, A. O. 2024; 5: 148-161

View details for DOI 10.1109/JSAIT.2024.3384395

View details for Web of Science ID 001395977600011
Power-Managed Data Centers for Sustainable Computing Zeger, E., Bambos, N., Pilanci, M. edited by Valenti, M., Reed, D., Torres, M. IEEE. 2024: 2833-2839

View details for DOI 10.1109/ICC51166.2024.10622993

View details for Web of Science ID 001300022502159
M-IHS: An accelerated randomized preconditioning method avoiding costly matrix decompositions LINEAR ALGEBRA AND ITS APPLICATIONS Ozaslan, I. K., Pilanci, M., Arikan, O. 2023; 678: 57-91

View details for DOI 10.1016/j.laa.2023.08.014

View details for Web of Science ID 001073651900001
Coil sketching for computationally efficient MR iterative reconstruction. Magnetic resonance in medicine Oscanoa, J. A., Ong, F., Iyer, S. S., Li, Z., Sandino, C. M., Ozturkler, B., Ennis, D. B., Pilanci, M., Vasanawala, S. S. 2023

Abstract

Parallel imaging and compressed sensing reconstructions of large MRI datasets often have a prohibitive computational cost that bottlenecks clinical deployment, especially for three-dimensional (3D) non-Cartesian acquisitions. One common approach is to reduce the number of coil channels actively used during reconstruction as in coil compression. While effective for Cartesian imaging, coil compression inherently loses signal energy, producing shading artifacts that compromise image quality for 3D non-Cartesian imaging. We propose coil sketching, a general and versatile method for computationally-efficient iterative MR image reconstruction.We based our method on randomized sketching algorithms, a type of large-scale optimization algorithms well established in the fields of machine learning and big data analysis. We adapt the sketching theory to the MRI reconstruction problem via a structured sketching matrix that, similar to coil compression, considers high-energy virtual coils obtained from principal component analysis. But, unlike coil compression, it also considers random linear combinations of the remaining low-energy coils, effectively leveraging information from all coils.First, we performed ablation experiments to validate the sketching matrix design on both Cartesian and non-Cartesian datasets. The resulting design yielded both improved computatioanal efficiency and preserved signal-to-noise ratio (SNR) as measured by the inverse g-factor. Then, we verified the efficacy of our approach on high-dimensional non-Cartesian 3D cones datasets, where coil sketching yielded up to three-fold faster reconstructions with equivalent image quality.Coil sketching is a general and versatile reconstruction framework for computationally fast and memory-efficient reconstruction.

View details for DOI 10.1002/mrm.29883

View details for PubMedID 37848365
Sketching the Krylov subspace: faster computation of the entire ridge regularization path (May, 10.1007/s11227-023-05309-w, 2023) JOURNAL OF SUPERCOMPUTING Wang, Y., Pilanci, M. 2023

View details for DOI 10.1007/s11227-023-05476-w

View details for Web of Science ID 001025419600003
Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration, and Lower Bounds IEEE TRANSACTIONS ON INFORMATION THEORY Bartan, B., Pilanci, M. 2023; 69 (6): 3850-3879

View details for DOI 10.1109/TIT.2023.3247559

View details for Web of Science ID 001008170200027
Sketching the Krylov subspace: faster computation of the entire ridge regularization path JOURNAL OF SUPERCOMPUTING Wang, Y., Pilanci, M. 2023

View details for DOI 10.1007/s11227-023-05309-w

View details for Web of Science ID 000990966900003
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks Ergen, T., Pilanci, M. edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023

View details for Web of Science ID 001227224007016
Matrix Compression via Randomized Low Rank and Low Precision Factorization Saha, R., Srivastava, V., Pilanci, M. edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023

View details for Web of Science ID 001226352805039
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs Dwaraknath, R., Ergen, T., Pilanci, M. edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023

View details for Web of Science ID 001220818809030
Adaptive and Oblivious Randomized Subspace Methods for High-Dimensional Optimization: Sharp Analysis and Lower Bounds IEEE TRANSACTIONS ON INFORMATION THEORY Lacotte, J., Pilanci, M. 2022; 68 (5): 3281-3303

View details for DOI 10.1109/TIT.2022.3146206

View details for Web of Science ID 000784190500036
Computational Polarization: An Information-Theoretic Method for Resilient Computing IEEE TRANSACTIONS ON INFORMATION THEORY Pilanci, M. 2022; 68 (4): 2211-2238

View details for DOI 10.1109/TIT.2021.3139009

View details for Web of Science ID 000770590900011
A Data-Driven Waveform Adaptation Method for Mm-Wave Gait Classification at the Edge IEEE SIGNAL PROCESSING LETTERS Hor, S., Pilanci, M., Arbabian, A. 2022; 29: 26-30

View details for DOI 10.1109/LSP.2021.3122355

View details for Web of Science ID 000745491900006
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions Mishkin, A., Sahiner, A., Pilanci, M. edited by Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2022

View details for Web of Science ID 000900064905040
Neural Fisher Discriminant Analysis: Optimal Neural Network Embeddings in Polynomial Time Bartan, B., Pilanci, M. edited by Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2022

View details for Web of Science ID 000899944901030
Secure Linear MDS Coded Matrix Inversion Charalambides, N., Pilanci, M., Hero, A. O., IEEE IEEE. 2022

View details for DOI 10.1109/ALLERTON49937.2022.9929386

View details for Web of Science ID 000895747000049
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers Sahiner, A., Ergen, T., Ozturkler, B., Pauly, J., Mardani, M., Pilanci, M. edited by Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2022: 19050-19088

View details for Web of Science ID 000900130200004
Approximate Function Evaluation via Multi-Armed Bandits Baharav, T. Z., Cheng, G., Pilanci, M., Tse, D. edited by Camps-Valls, G., Ruiz, F. J., Valera JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2022: 108-135

View details for Web of Science ID 000828072700007
Scale-Equivariant Unrolled Neural Networks for Data-Efficient Accelerated MRI Reconstruction Gunel, B., Sahiner, A., Desai, A. D., Chaudhari, A. S., Vasanawala, S., Pilanci, M., Pauly, J. edited by Wang, L., Dou, Q., Fletcher, P. T., Speidel, S., Li, S. SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 737-747

View details for DOI 10.1007/978-3-031-16446-0_70

View details for Web of Science ID 000867434800070
Convex Geometry and Duality of Over-parameterized Neural Networks Convex Geometry and Duality of Over-parameterized Neural Networks Ergen, T., Pilanci, M. 2021
Linear Predictive Coding for Acute Stress Prediction from Computer Mouse Movements. Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference Kim, L. H., Goel, R., Liang, J., Pilanci, M., Paredes, P. E. 2021; 2021: 7465-7469

Abstract

Prior work demonstrated the potential of using the Linear Predictive Coding (LPC) filter to approximate muscle stiffness and damping from computer mouse movements to predict acute stress levels of users. Theoretically, muscle stiffness and damping in the arm can be estimated using a mass-spring-damper (MSD) biomechanical model. However, the damping frequency (i.e., stiffness) and damping ratio values derived using LPC were not yet compared with those from a theoretical MSD model. This work demonstrates that the damping frequency and damping ratio from LPC are significantly correlated with those from an MSD model, thus confirming the validity of using LPC to infer muscle stiffness and damping. We also compare the stress level binary classification performance using the values from LPC and MSD with each other and with neural network-based baselines. We found comparable performance across all conditions demonstrating LPC and MSD model-based stress prediction efficacy, especially for longer mouse trajectories.Clinical relevance- This work demonstrates the validity of the LPC filter to approximate muscle stiffness and damping and predict acute stress from computer mouse movements.

View details for DOI 10.1109/EMBC46164.2021.9630217

View details for PubMedID 34892820
APPROXIMATE WEIGHTED CR CODED MATRIX MULTIPLICATION Charalambides, N., Pilanci, M., Hero, A. O., IEEE IEEE. 2021: 5095-5099

View details for DOI 10.1109/ICASSP39728.2021.9413800

View details for Web of Science ID 000704288405072
CONVEX NEURAL AUTOREGRESSIVE MODELS: TOWARDS TRACTABLE, EXPRESSIVE, AND THEORETICALLY-BACKED MODELS FOR SEQUENTIAL FORECASTING AND GENERATION Gupta, V., Bartan, B., Ergen, T., Pilanci, M., IEEE IEEE. 2021: 3890-3894

View details for DOI 10.1109/ICASSP39728.2021.9413662

View details for Web of Science ID 000704288404030
Boost AI Power: Data Augmentation Strategies with Unlabeled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose IEEE Sensors Journal Liu, L., Zhan, X., Wu, R., Guan, X., Wang, Z., Pilanci, M., Luo, Z., Li, G., Wang, Y. 2021: 1-11

View details for DOI 10.1109/JSEN.2021.3102488
Convex Geometry and Duality of Over-parameterized Neural Networks JOURNAL OF MACHINE LEARNING RESEARCH Ergen, T., Pilanci, M. 2021; 22

View details for Web of Science ID 000706451500001
Revealing the Structure of Deep Neural Networks via Convex Duality Ergen, T., Pilanci, M. edited by Meila, M., Zhang, T. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2021

View details for Web of Science ID 000683104603002
Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs Ergen, T., Pilanci, M. edited by Meila, M., Zhang, T. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2021

View details for Web of Science ID 000683104603001
Adaptive Newton Sketch: Linear-time Optimization with Quadratic Convergence and Effective Hessian Dimensionality Lacotte, J., Wang, Y., Pilanci, M. edited by Meila, M., Zhang, T. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2021

View details for Web of Science ID 000683104605087
Boost AI Power: Data Augmentation Strategies with Unlabelled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose IEEE Sensors Journal Liu, L., et al 2021
WEIGHTED GRADIENT CODING WITH LEVERAGE SCORE SAMPLING Charalambides, N., Pilanci, M., Hero, A. O., IEEE IEEE. 2020: 5215–19

View details for Web of Science ID 000615970405095
Weighted Gradient Coding with Leverage Score Sampling IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Charalambides, N., Pilanci, M., Hero, A. O. 2020
Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) Ergen, T., Pilanci, M. 2020
Optimal Randomized First-Order Methods for Least-Squares Problems Lacotte, J., Pilanci, M. arXiv:2002.09488 . 2020

Abstract

We provide an exact analysis of a class of randomized algorithms for solving overdetermined least-squares problems. We consider first-order methods, where the gradients are pre-conditioned by an approximation of the Hessian, based on a subspace embedding of the data matrix. This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled randomized Hadamard transforms (SRHT). Our key technical innovation is the derivation of the limiting spectral density of SRHT embeddings. Leveraging this novel result, we derive the family of normalized orthogonal polynomials of the SRHT density and we find the optimal pre-conditioned first-order method along with its rate of convergence. Our analysis of Gaussian embeddings proceeds similarly, and leverages classical random matrix theory results. In particular, we show that for a given sketch size, SRHT embeddings exhibits a faster rate of convergence than Gaussian embeddings. Then, we propose a new algorithm by optimizing the computational complexity over the choice of the sketching dimension. To our knowledge, our resulting algorithm yields the best known complexity for solving least-squares problems with no condition number dependence.
arXiv
Convex Duality of Deep Neural Networks Ergen, T., Pilanci, M. arXiv:2002.09773. 2020

Abstract

We study regularized deep neural networks and introduce an analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weight matrices for a norm regularized deep neural network training problem can be explicitly found as the extreme points of a convex set. For two-layer linear networks, we first formulate a convex dual program and prove that strong duality holds. We then extend our derivations to prove that strong duality also holds for certain deep networks. In particular, for linear deep networks, we show that each optimal layer weight matrix is rank-one and aligns with the previous layers when the network output is scalar. We also extend our analysis to the vector outputs and other convex loss functions. More importantly, we show that the same characterization can also be applied to deep ReLU networks with rank-one inputs, where we prove that strong duality still holds and optimal layer weight matrices are rank-one for scalar output networks. As a corollary, we prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. We then verify our theoretical results via several numerical experiments.
arXiv
Convex Geometry and Duality of Over-parameterized Neural Networks Ergen, T., Pilanci, M. arXiv:2002.11219. 2020

Abstract

We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics. We show that neural networks with rectified linear units act as convex regularizers, where simple solutions are encouraged via extreme points of a certain convex set. For one dimensional regression and classification, as well as rank-one data matrices, we prove that finite two-layer ReLU networks with norm regularization yield linear spline interpolation. We characterize the classification decision regions in terms of a closed form kernel matrix and minimum L1 norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain neural network predictions with finitely many neurons. Our convex geometric description also provides intuitive explanations of hidden neurons as auto-encoders. In higher dimensions, we show that the training problem for two-layer networks can be cast as a convex optimization problem with infinitely many constraints. We then provide a family of convex relaxations to approximate the solution, and a cutting-plane algorithm to improve the relaxations. We derive conditions for the exactness of the relaxations and provide simple closed form formulas for the optimal neural network weights in certain cases. We also establish a connection to - equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing. Extensive experimental results show that the proposed approach yields interpretable and accurate models.
arXiv
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks Pilanci, M., Ergen, T. International Conference on Machine Learning (ICML), 2020. 2020

Abstract

We develop exact representations of two layer neural networks with rectified linear units in terms of a single convex program with number of variables polynomial in the number of training samples and number of hidden neurons. Our theory utilizes semi-infinite duality and minimum norm regularization. Moreover, we show that certain standard multi-layer convolutional neural networks are equivalent to L1 regularized linear models in a polynomial sized discrete Fourier feature space. We also introduce exact semi-definite programming representations of convolutional and fully connected linear multi-layer networks which are polynomial size in both the sample size and dimension.
arXiv
Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models Ergen, T., Pilanci, M. edited by Chiappa, S., Calandra, R. ADDISON-WESLEY PUBL CO. 2020: 4024–32

View details for Web of Science ID 000559931300084
Limiting Spectrum of Randomized Hadamard Transform and Optimal Iterative Sketching Methods Lacotte, J., Liu, S., Dobriban, E., Pilanci, M. International Conference on Machine Learning (ICML), 2020. 2020

Abstract

We provide an exact analysis of the limiting spectrum of matrices randomly projected either with the subsampled randomized Hadamard transform, or truncated Haar matrices. We characterize this limiting distribution through its Stieltjes transform, a classical object in random matrix theory, and compute the first and second inverse moments. We leverage the limiting spectrum and asymptotic freeness of random matrices to obtain an exact analysis of iterative sketching methods for solving least squares problems. Our results also yield optimal step-sizes and convergence rates in terms of simple closed-form expressions. Moreover, we show that the convergence rate for Haar and randomized Hadamard matrices are identical, and uniformly improve upon Gaussian random projections. The developed techniques and formulas can be applied to a plethora of randomized algorithms that employ fast randomized Hadamard dimension reduction.
Weighted Gradient Coding with Leverage Score Sampling IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Charalambides, N., Pilanci, M., Hero, A. O. 2020
Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) Ergen, T., Pilanci, M. 2020
Optimal Randomized First-Order Methods for Least-Squares Problems Lacotte, J., Pilanci, M. arXiv:2002.09488 . 2020

Abstract

We provide an exact analysis of a class of randomized algorithms for solving overdetermined least-squares problems. We consider first-order methods, where the gradients are pre-conditioned by an approximation of the Hessian, based on a subspace embedding of the data matrix. This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled randomized Hadamard transforms (SRHT). Our key technical innovation is the derivation of the limiting spectral density of SRHT embeddings. Leveraging this novel result, we derive the family of normalized orthogonal polynomials of the SRHT density and we find the optimal pre-conditioned first-order method along with its rate of convergence. Our analysis of Gaussian embeddings proceeds similarly, and leverages classical random matrix theory results. In particular, we show that for a given sketch size, SRHT embeddings exhibits a faster rate of convergence than Gaussian embeddings. Then, we propose a new algorithm by optimizing the computational complexity over the choice of the sketching dimension. To our knowledge, our resulting algorithm yields the best known complexity for solving least-squares problems with no condition number dependence.
arXiv
Convex Duality of Deep Neural Networks Ergen, T., Pilanci, M. arXiv:2002.09773. 2020

Abstract

We study regularized deep neural networks and introduce an analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weight matrices for a norm regularized deep neural network training problem can be explicitly found as the extreme points of a convex set. For two-layer linear networks, we first formulate a convex dual program and prove that strong duality holds. We then extend our derivations to prove that strong duality also holds for certain deep networks. In particular, for linear deep networks, we show that each optimal layer weight matrix is rank-one and aligns with the previous layers when the network output is scalar. We also extend our analysis to the vector outputs and other convex loss functions. More importantly, we show that the same characterization can also be applied to deep ReLU networks with rank-one inputs, where we prove that strong duality still holds and optimal layer weight matrices are rank-one for scalar output networks. As a corollary, we prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. We then verify our theoretical results via several numerical experiments.
arXiv
Convex Geometry and Duality of Over-parameterized Neural Networks Ergen, T., Pilanci, M. arXiv:2002.11219. 2020

Abstract

We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics. We show that neural networks with rectified linear units act as convex regularizers, where simple solutions are encouraged via extreme points of a certain convex set. For one dimensional regression and classification, as well as rank-one data matrices, we prove that finite two-layer ReLU networks with norm regularization yield linear spline interpolation. We characterize the classification decision regions in terms of a closed form kernel matrix and minimum L1 norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain neural network predictions with finitely many neurons. Our convex geometric description also provides intuitive explanations of hidden neurons as auto-encoders. In higher dimensions, we show that the training problem for two-layer networks can be cast as a convex optimization problem with infinitely many constraints. We then provide a family of convex relaxations to approximate the solution, and a cutting-plane algorithm to improve the relaxations. We derive conditions for the exactness of the relaxations and provide simple closed form formulas for the optimal neural network weights in certain cases. We also establish a connection to - equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing. Extensive experimental results show that the proposed approach yields interpretable and accurate models.
arXiv
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks Pilanci, M., Ergen, T. International Conference on Machine Learning (ICML), 2020. 2020

Abstract

We develop exact representations of two layer neural networks with rectified linear units in terms of a single convex program with number of variables polynomial in the number of training samples and number of hidden neurons. Our theory utilizes semi-infinite duality and minimum norm regularization. Moreover, we show that certain standard multi-layer convolutional neural networks are equivalent to L1 regularized linear models in a polynomial sized discrete Fourier feature space. We also introduce exact semi-definite programming representations of convolutional and fully connected linear multi-layer networks which are polynomial size in both the sample size and dimension.
arXiv
Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models Ergen, T., Pilanci, M. edited by Chiappa, S., Calandra, R. ADDISON-WESLEY PUBL CO. 2020: 4024–32

View details for Web of Science ID 000559931300084
High-Dimensional Optimization in Adaptive Random Subspaces Lacotte, J., Pilanci, M., Pavone, M. edited by Wallach, H., Larochelle, H., Beygelzimer, A., d'Alche-Buc, F., Fox, E., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019

View details for Web of Science ID 000535866902047
Convex Optimization for Shallow Neural Networks Ergen, T., Pilanci, M., IEEE IEEE. 2019: 79–83

View details for Web of Science ID 000535355700013
Distributed Black-Box Optimization via Error Correcting Codes Bartan, B., Pilanci, M., IEEE IEEE. 2019: 246–52

View details for Web of Science ID 000535355700035
Straggler Resilient Serverless Computing Based on Polar Codes Bartan, B., Pilanci, M., IEEE IEEE. 2019: 276–83

View details for Web of Science ID 000535355700039
Faster Least Squares Optimization Lacotte, J., Pilanci, M. arXiv:1911.02675. 2019

Abstract

We investigate randomized methods for solving overdetermined linear least-squares problems, where the Hessian is approximated based on a random projection of the data matrix. We consider a random subspace embedding which is either drawn at the beginning of the algorithm and fixed throughout, or, refreshed at each iteration. For a broad class of random matrices, we provide an exact finite-time analysis of the refreshed embeddings method, an exact asymptotic analysis of the fixed embedding method, as well as a non-asymptotic analysis, with and without momentum acceleration. Surprisingly, we show that, for Gaussian matrices, the refreshed sketching method with no momentum yields the same convergence rate as the fixed embedding method with momentum. Furthermore, we prove that momentum does not accelerate the refreshed embeddings method. Thus, picking the accelerated, fixed embedding method as the algorithm of choice among the methods we consider, we propose a new algorithm by optimizing the computational complexity over the choice of the sketching dimension. Our resulting algorithm yields a smaller complexity compared to current state-of-the-art randomized pre-conditioning methods. In particular, as the sample size grows, the resulting complexity becomes sub-linear in the problem dimensions. We validate numerically our guarantees on large sample datasets, both for Gaussian and SRHT embeddings.
Straggler Resilient Serverless Computing Based on Polar Codes 57th Annual Allerton Conference on Communication, Control, and Computing Bartan, B., Pilanci, M. 2019
Distributed Black-Box Optimization via Error Correcting Codes 57th Annual Allerton Conference on Communication, Control, and Computing Bartan, B., Pilanci, M. 2019
High-Dimensional Optimization in Adaptive Random Subspaces Neural Information Processing Systems (NeurIPS) Lacotte, J., Pilanci, M., Pavone, M. 2019
Fast and Robust Solution Techniques for Large Scale Linear System of Equations Ozaslan, I. K., Pilanci, M., Arikan, O., IEEE IEEE. 2019

View details for Web of Science ID 000518994300160
CONVEX RELAXATIONS OF CONVOLUTIONAL NEURAL NETS Bartan, B., Pilanci, M., IEEE IEEE. 2019: 4928–32

View details for Web of Science ID 000482554005033
ITERATIVE HESSIAN SKETCH WITH MOMENTUM Ozaslan, I., Pilanci, M., Arikan, O., IEEE IEEE. 2019: 7470–74

View details for Web of Science ID 000482554007141
NEWTON SKETCH: A NEAR LINEAR-TIME OPTIMIZATION ALGORITHM WITH LINEAR-QUADRATIC CONVERGENCE SIAM JOURNAL ON OPTIMIZATION Pilanci, M., Wainwright, M. J. 2017; 27 (1): 205–45

View details for DOI 10.1137/15M1021106

View details for Web of Science ID 000404178500010
Randomized sketches for kernels: Fast and optimal non-parametric regression Annals of Statistics Yang, Y., Pilanci, M., Wainwright, M. J. 2017
Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squares Journal of Machine Learning Research (JMLR) Pilanci, M., Wainwright, M. J. 2016
Sparse learning via Boolean relaxations Mathematical Programming Pilanci, M., Wainwright, M. J., El Ghaoui, L. 2015
Randomized Sketches of Convex Programs With Sharp Guarantees IEEE Transactions on Information Theory Pilanci, M., Wainwright, M. J. 2015
Expectation Maximization Based Matching Pursuit Gurbuz, A., Pilanci, M., Arikan, O., IEEE IEEE. 2012: 3313-3316

View details for Web of Science ID 000312381403097
Structured Least Squares Problems and Robust Estimators IEEE TRANSACTIONS ON SIGNAL PROCESSING Pilanci, M., Arikan, O., Pinar, M. C. 2010; 58 (5): 2453-2465

View details for DOI 10.1109/TSP.2010.2041279

View details for Web of Science ID 000276685600001
Structured least squares problems and robust estimators IEEE Transactions on Signal Processing Pilanci, M., Arikan, O., Pinar, M. C. 2010
A Novel Technique for a Linear System of Equations Applied to Channel Equalization Pilanci, M., Arikan, O., Oguz, B., Pinar, M. C., IEEE IEEE. 2009: 230-+

View details for Web of Science ID 000273935600058

Mert Pilanci

Assistant Professor of Electrical Engineering

Bio

Academic Appointments

Honors & Awards

Program Affiliations

Professional Education

Additional Info

Links

Current Research and Scholarly Interests

2025-26 Courses

2024-25 Courses

2023-24 Courses

2022-23 Courses

Stanford Advisees

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract