Academic Appointments


2019-20 Courses


Stanford Advisees


  • Doctoral Dissertation Reader (AC)
    Claire Donnat

All Publications


  • rCOSA: A Software Package for Clustering Objects on Subsets of Attributes JOURNAL OF CLASSIFICATION Kampert, M. M., Meulman, J. J., Friedman, J. H. 2017; 34 (3): 514–47
  • A New Graph-Based Two-Sample Test for Multivariate and Object Data JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Chen, H., Friedman, J. H. 2017; 112 (517): 397-409
  • A STUDY OF ERROR VARIANCE ESTIMATION IN LASSO REGRESSION STATISTICA SINICA Reid, S., Tibshirani, R., Friedman, J. 2016; 26 (1): 35-67
  • A Sparse-Group Lasso JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS Simon, N., Friedman, J., Hastie, T., Tibshirani, R. 2013; 22 (2): 231-245
  • Fast sparse regression and classification INTERNATIONAL JOURNAL OF FORECASTING Friedman, J. H. 2012; 28 (3): 722-738
  • Rejoinder INTERNATIONAL JOURNAL OF FORECASTING Friedman, J. H. 2012; 28 (3): 751–53
  • Strong rules for discarding predictors in lasso-type problems JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., Tibshirani, R. J. 2012; 74: 245-266

    Abstract

    We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have propose 'SAFE' rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, i.e. predictors that have non-zero coefficients in the solution. We therefore combine them with simple checks of the Karush-Kuhn-Tucker conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the Karush-Kuhn-Tucker, conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide substantial savings in computational time for a variety of statistical optimization problems.

    View details for DOI 10.1111/j.1467-9868.2011.01004.x

    View details for Web of Science ID 000301286200004

    View details for PubMedCentralID PMC4262615

  • New Insights and Faster Computations for the Graphical Lasso JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS Witten, D. M., Friedman, J. H., Simon, N. 2011; 20 (4): 892-900
  • SparseNet: Coordinate Descent With Nonconvex Penalties JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Mazumder, R., Friedman, J. H., Hastie, T. 2011; 106 (495): 1125-1138

    Abstract

    We address the problem of sparse selection in linear models. A number of nonconvex penalties have been proposed in the literature for this purpose, along with a variety of convex-relaxation algorithms for finding good solutions. In this article we pursue a coordinate-descent approach for optimization, and study its convergence properties. We characterize the properties of penalties suitable for this approach, study their corresponding threshold functions, and describe a df-standardizing reparametrization that assists our pathwise algorithm. The MC+ penalty is ideally suited to this task, and we use it to demonstrate the performance of our algorithm. Certain technical derivations and experiments related to this article are included in the Supplementary Materials section.

    View details for DOI 10.1198/jasa.2011.tm09738

    View details for Web of Science ID 000296224200037

    View details for PubMedCentralID PMC4286300

  • Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent JOURNAL OF STATISTICAL SOFTWARE Simon, N., Friedman, J., Hastie, T., Tibshirani, R. 2011; 39 (5): 1-13

    Abstract

    We introduce a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of ℓ1 and ℓ2 penalties (elastic net). Our algorithm fits via cyclical coordinate descent, and employs warm starts to find a solution along a regularization path. We demonstrate the efficacy of our algorithm on real and simulated data sets, and find considerable speedup between our algorithm and competing methods.

    View details for Web of Science ID 000288204000001

    View details for PubMedCentralID PMC4824408

  • REMEMBERING LEO ANNALS OF APPLIED STATISTICS Friedman, J. H. 2010; 4 (4): 1649–51

    View details for DOI 10.1214/10-AOAS432

    View details for Web of Science ID 000295451000006

  • Regularization Paths for Generalized Linear Models via Coordinate Descent JOURNAL OF STATISTICAL SOFTWARE Friedman, J., Hastie, T., Tibshirani, R. 2010; 33 (1): 1-22

    Abstract

    We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include ℓ(1) (the lasso), ℓ(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

    View details for Web of Science ID 000275203200001

    View details for PubMedCentralID PMC2929880

  • Sparse inverse covariance estimation with the graphical lasso BIOSTATISTICS Friedman, J., Hastie, T., Tibshirani, R. 2008; 9 (3): 432-441

    Abstract

    We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

    View details for DOI 10.1093/biostatistics/kxm045

    View details for Web of Science ID 000256977000005

    View details for PubMedID 18079126

    View details for PubMedCentralID PMC3019769

  • Response to Mease and Wyner, evidence contrary to the statistical view of boosting, JMLR 9 : 131-156, 2008 JOURNAL OF MACHINE LEARNING RESEARCH Friedman, J., Hastie, T., Tibshirani, R. 2008; 9: 175–80
  • Intruders pattern identification 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6 Di Gesu, V., Lo Bosco, G., Friedman, J. H. 2008: 452-?
  • PATHWISE COORDINATE OPTIMIZATION ANNALS OF APPLIED STATISTICS Friedman, J., Hastie, T., Hoefling, H., Tibshirani, R. 2007; 1 (2): 302-332

    View details for DOI 10.1214/07-AOAS131

    View details for Web of Science ID 000261057600003

  • Applications of a new subspace clustering algorithm (COSA) in medical systems biology METABOLOMICS Damian, D., Oresic, M., Verheij, E., Meulman, J., Friedman, J., Adourian, A., Morel, N., Smilde, A., van der Greef, J. 2007; 3 (1): 69-77
  • On bagging and nonlinear estimation JOURNAL OF STATISTICAL PLANNING AND INFERENCE Friedman, J. H., Hall, P. 2007; 137 (3): 669-683
  • Recent advances in predictive (machine) learning 13th International Meeting of the Psychometric-Society/68th Annual Meeting of the Psychometric-Society Friedman, J. H. SPRINGER. 2006: 175–97
  • Comment: Classifier technology and the illusion of progress STATISTICAL SCIENCE Friedman, J. H. 2006; 21 (1): 15–18
  • New similarity rules for mining data 16th Italian Workshop on Neural Nets Di Gesu, V., Friedman, J. H. SPRINGER-VERLAG BERLIN. 2006: 179–187
  • Note on "Comparison of model selection for regression" by Vladimir Cherkassky and Yunqian Ma NEURAL COMPUTATION Hastie, T., Tibshirani, R., Friedman, J. 2003; 15 (7): 1477-1480

    Abstract

    While Cherkassky and Ma (2003) raise some interesting issues in comparing techniques for model selection, their article appears to be written largely in protest of comparisons made in our book, Elements of Statistical Learning (2001). Cherkassky and Ma feel that we falsely represented the structural risk minimization (SRM) method, which they defend strongly here. In a two-page section of our book (pp. 212-213), we made an honest attempt to compare the SRM method with two related techniques, Aikaike information criterion (AIC) and Bayesian information criterion (BIC). Apparently, we did not apply SRM in the optimal way. We are also accused of using contrived examples, designed to make SRM look bad. Alas, we did introduce some careless errors in our original simulation--errors that were corrected in the second and subsequent printings. Some of these errors were pointed out to us by Cherkassky and Ma (we supplied them with our source code), and as a result we replaced the assessment "SRM performs poorly overall" with a more moderate "the performance of SRM is mixed" (p. 212).

    View details for Web of Science ID 000183421400002

    View details for PubMedID 12816562

  • Multiple additive regression trees with application in epidemiology STATISTICS IN MEDICINE Friedman, J. H., Meulman, J. J. 2003; 22 (9): 1365–81

    Abstract

    Predicting future outcomes based on knowledge obtained from past observational data is a common application in a wide variety of areas of scientific research. In the present paper, prediction will be focused on various grades of cervical preneoplasia and neoplasia. Statistical tools used for prediction should of course possess predictive accuracy, and preferably meet secondary requirements such as speed, ease of use, and interpretability of the resulting predictive model. A new automated procedure based on an extension (called 'boosting') of regression and classification tree (CART) models is described. The resulting tool is a fast 'off-the-shelf' procedure for classification and regression that is competitive in accuracy with more customized approaches, while being fairly automatic to use (little tuning), and highly robust especially when applied to less than clean data. Additional tools are presented for interpreting and visualizing the results of such multiple additive regression tree (MART) models.

    View details for PubMedID 12704603

  • Greedy function approximation: A gradient boosting machine ANNALS OF STATISTICS Friedman, J. H. 2001; 29 (5): 1189-1232
  • Adaptive signal regression Annual Meeting of the American-Statistical-Association, Statistical-Computing-Section Land, S. R., Friedman, J. H. AMER STATISTICAL ASSOC. 1994: 100–105
  • THE II METHOD FOR ESTIMATING MULTIVARIATE FUNCTIONS FROM NOISY DATA - DISCUSSION TECHNOMETRICS Friedman, J. H. 1991; 33 (2): 145-148
  • MULTIVARIATE ADAPTIVE REGRESSION SPLINES - REJOINDER ANNALS OF STATISTICS Friedman, J. H. 1991; 19 (1): 123-141
  • REGULARIZED DISCRIMINANT-ANALYSIS JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Friedman, J. H. 1989; 84 (405): 165-175
  • EXPLORATORY PROJECTION PURSUIT JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Friedman, J. H. 1987; 82 (397): 249-266
  • PROJECTION PURSUIT - DISCUSSION ANNALS OF STATISTICS Friedman, J. H. 1985; 13 (2): 475-481
  • THE MONOTONE SMOOTHING OF SCATTERPLOTS TECHNOMETRICS Friedman, J., Tibshirani, R. 1984; 26 (3): 243-250
  • DEVELOPMENTS IN LINEAR-REGRESSION METHODOLOGY - 1959-1982 - DISCUSSION TECHNOMETRICS Friedman, J. H. 1983; 25 (3): 246-247
  • MEASUREMENT OF MULTIVARIATE SCALING AND FACTORIZATION IN EXCLUSIVE MULTIPARTICLE PRODUCTION PHYSICAL REVIEW D Friedman, J. H. 1974; 9 (11): 3053-3058