Jerry Friedman
Professor of Statistics, Emeritus
2024-25 Courses
- Consulting Workshop
STATS 390 (Spr) -
Prior Year Courses
2023-24 Courses
- Consulting Workshop
STATS 390 (Spr)
2022-23 Courses
- Consulting Workshop
STATS 390 (Spr)
2021-22 Courses
- Consulting Workshop
STATS 390 (Spr)
- Consulting Workshop
All Publications
-
rCOSA: A Software Package for Clustering Objects on Subsets of Attributes
JOURNAL OF CLASSIFICATION
2017; 34 (3): 514–47
View details for DOI 10.1007/s00357-017-9240-z
View details for Web of Science ID 000416539000010
-
A New Graph-Based Two-Sample Test for Multivariate and Object Data
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2017; 112 (517): 397-409
View details for DOI 10.1080/01621459.2016.1147356
View details for Web of Science ID 000400765200033
-
A STUDY OF ERROR VARIANCE ESTIMATION IN LASSO REGRESSION
STATISTICA SINICA
2016; 26 (1): 35-67
View details for DOI 10.5705/ss.2014.042
View details for Web of Science ID 000368972400002
-
A Sparse-Group Lasso
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
2013; 22 (2): 231-245
View details for DOI 10.1080/10618600.2012.681250
View details for Web of Science ID 000319954000001
-
Fast sparse regression and classification
INTERNATIONAL JOURNAL OF FORECASTING
2012; 28 (3): 722-738
View details for DOI 10.1016/j.ijforecast.2012.05.001
View details for Web of Science ID 000306820500016
-
Rejoinder
INTERNATIONAL JOURNAL OF FORECASTING
2012; 28 (3): 751–53
View details for DOI 10.1016/j.ijforecast.2012.05.004
View details for Web of Science ID 000306820500020
-
Strong rules for discarding predictors in lasso-type problems
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
2012; 74: 245-266
Abstract
We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have propose 'SAFE' rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, i.e. predictors that have non-zero coefficients in the solution. We therefore combine them with simple checks of the Karush-Kuhn-Tucker conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the Karush-Kuhn-Tucker, conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide substantial savings in computational time for a variety of statistical optimization problems.
View details for DOI 10.1111/j.1467-9868.2011.01004.x
View details for Web of Science ID 000301286200004
View details for PubMedCentralID PMC4262615
-
New Insights and Faster Computations for the Graphical Lasso
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
2011; 20 (4): 892-900
View details for DOI 10.1198/jcgs.2011.11051a
View details for Web of Science ID 000298772600007
-
SparseNet: Coordinate Descent With Nonconvex Penalties
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2011; 106 (495): 1125-1138
Abstract
We address the problem of sparse selection in linear models. A number of nonconvex penalties have been proposed in the literature for this purpose, along with a variety of convex-relaxation algorithms for finding good solutions. In this article we pursue a coordinate-descent approach for optimization, and study its convergence properties. We characterize the properties of penalties suitable for this approach, study their corresponding threshold functions, and describe a df-standardizing reparametrization that assists our pathwise algorithm. The MC+ penalty is ideally suited to this task, and we use it to demonstrate the performance of our algorithm. Certain technical derivations and experiments related to this article are included in the Supplementary Materials section.
View details for DOI 10.1198/jasa.2011.tm09738
View details for Web of Science ID 000296224200037
View details for PubMedCentralID PMC4286300
-
Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent
JOURNAL OF STATISTICAL SOFTWARE
2011; 39 (5): 1-13
Abstract
We introduce a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of ℓ1 and ℓ2 penalties (elastic net). Our algorithm fits via cyclical coordinate descent, and employs warm starts to find a solution along a regularization path. We demonstrate the efficacy of our algorithm on real and simulated data sets, and find considerable speedup between our algorithm and competing methods.
View details for Web of Science ID 000288204000001
View details for PubMedCentralID PMC4824408
-
REMEMBERING LEO
ANNALS OF APPLIED STATISTICS
2010; 4 (4): 1649–51
View details for DOI 10.1214/10-AOAS432
View details for Web of Science ID 000295451000006
-
Regularization Paths for Generalized Linear Models via Coordinate Descent
JOURNAL OF STATISTICAL SOFTWARE
2010; 33 (1): 1-22
Abstract
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include ℓ(1) (the lasso), ℓ(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
View details for Web of Science ID 000275203200001
View details for PubMedCentralID PMC2929880
-
Sparse inverse covariance estimation with the graphical lasso
BIOSTATISTICS
2008; 9 (3): 432-441
Abstract
We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
View details for DOI 10.1093/biostatistics/kxm045
View details for Web of Science ID 000256977000005
View details for PubMedID 18079126
View details for PubMedCentralID PMC3019769
-
Response to Mease and Wyner, evidence contrary to the statistical view of boosting, JMLR 9 : 131-156, 2008
JOURNAL OF MACHINE LEARNING RESEARCH
2008; 9: 175–80
View details for Web of Science ID 000256641800005
-
Intruders pattern identification
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6
2008: 452-?
View details for Web of Science ID 000264729000111
-
PATHWISE COORDINATE OPTIMIZATION
ANNALS OF APPLIED STATISTICS
2007; 1 (2): 302-332
View details for DOI 10.1214/07-AOAS131
View details for Web of Science ID 000261057600003
-
Applications of a new subspace clustering algorithm (COSA) in medical systems biology
METABOLOMICS
2007; 3 (1): 69-77
View details for DOI 10.1007/s11306-006-0045-z
View details for Web of Science ID 000245323200007
-
On bagging and nonlinear estimation
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
2007; 137 (3): 669-683
View details for DOI 10.1016/j.jspi.2006.06.002
View details for Web of Science ID 000242706300004
-
Recent advances in predictive (machine) learning
13th International Meeting of the Psychometric-Society/68th Annual Meeting of the Psychometric-Society
SPRINGER. 2006: 175–97
View details for DOI 10.1007/s00357-006-0012-4
View details for Web of Science ID 000241792300002
-
Comment: Classifier technology and the illusion of progress
STATISTICAL SCIENCE
2006; 21 (1): 15–18
View details for DOI 10.1214/0883423060000000240
View details for Web of Science ID 000238586200002
-
New similarity rules for mining data
16th Italian Workshop on Neural Nets
SPRINGER-VERLAG BERLIN. 2006: 179–187
View details for Web of Science ID 000237451800026
-
Note on "Comparison of model selection for regression" by Vladimir Cherkassky and Yunqian Ma
NEURAL COMPUTATION
2003; 15 (7): 1477-1480
Abstract
While Cherkassky and Ma (2003) raise some interesting issues in comparing techniques for model selection, their article appears to be written largely in protest of comparisons made in our book, Elements of Statistical Learning (2001). Cherkassky and Ma feel that we falsely represented the structural risk minimization (SRM) method, which they defend strongly here. In a two-page section of our book (pp. 212-213), we made an honest attempt to compare the SRM method with two related techniques, Aikaike information criterion (AIC) and Bayesian information criterion (BIC). Apparently, we did not apply SRM in the optimal way. We are also accused of using contrived examples, designed to make SRM look bad. Alas, we did introduce some careless errors in our original simulation--errors that were corrected in the second and subsequent printings. Some of these errors were pointed out to us by Cherkassky and Ma (we supplied them with our source code), and as a result we replaced the assessment "SRM performs poorly overall" with a more moderate "the performance of SRM is mixed" (p. 212).
View details for Web of Science ID 000183421400002
View details for PubMedID 12816562
-
Multiple additive regression trees with application in epidemiology
STATISTICS IN MEDICINE
2003; 22 (9): 1365–81
Abstract
Predicting future outcomes based on knowledge obtained from past observational data is a common application in a wide variety of areas of scientific research. In the present paper, prediction will be focused on various grades of cervical preneoplasia and neoplasia. Statistical tools used for prediction should of course possess predictive accuracy, and preferably meet secondary requirements such as speed, ease of use, and interpretability of the resulting predictive model. A new automated procedure based on an extension (called 'boosting') of regression and classification tree (CART) models is described. The resulting tool is a fast 'off-the-shelf' procedure for classification and regression that is competitive in accuracy with more customized approaches, while being fairly automatic to use (little tuning), and highly robust especially when applied to less than clean data. Additional tools are presented for interpreting and visualizing the results of such multiple additive regression tree (MART) models.
View details for PubMedID 12704603
-
Greedy function approximation: A gradient boosting machine
ANNALS OF STATISTICS
2001; 29 (5): 1189-1232
View details for Web of Science ID 000173361700001
-
Adaptive signal regression
Annual Meeting of the American-Statistical-Association, Statistical-Computing-Section
AMER STATISTICAL ASSOC. 1994: 100–105
View details for Web of Science ID A1994BD83T00012
-
THE II METHOD FOR ESTIMATING MULTIVARIATE FUNCTIONS FROM NOISY DATA - DISCUSSION
TECHNOMETRICS
1991; 33 (2): 145-148
View details for Web of Science ID A1991FJ19800002
-
MULTIVARIATE ADAPTIVE REGRESSION SPLINES - REJOINDER
ANNALS OF STATISTICS
1991; 19 (1): 123-141
View details for Web of Science ID A1991FF04700011
-
REGULARIZED DISCRIMINANT-ANALYSIS
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
1989; 84 (405): 165-175
View details for Web of Science ID A1989U244700020
-
EXPLORATORY PROJECTION PURSUIT
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
1987; 82 (397): 249-266
View details for Web of Science ID A1987G462600044
-
PROJECTION PURSUIT - DISCUSSION
ANNALS OF STATISTICS
1985; 13 (2): 475-481
View details for Web of Science ID A1985AKQ9500002
-
THE MONOTONE SMOOTHING OF SCATTERPLOTS
TECHNOMETRICS
1984; 26 (3): 243-250
View details for Web of Science ID A1984TD92100006
-
DEVELOPMENTS IN LINEAR-REGRESSION METHODOLOGY - 1959-1982 - DISCUSSION
TECHNOMETRICS
1983; 25 (3): 246-247
View details for Web of Science ID A1983RC34300006
-
MEASUREMENT OF MULTIVARIATE SCALING AND FACTORIZATION IN EXCLUSIVE MULTIPARTICLE PRODUCTION
PHYSICAL REVIEW D
1974; 9 (11): 3053-3058
View details for Web of Science ID A1974T473500010