Latest me news

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

By projecteuclid.org
Published On :: Mon, 17 Feb 2020 04:02 EST

Dimitris Bertsimas, Bart Van Parys.

Source: The Annals of Statistics, Volume 48, Number 1, 300--323.

Abstract:
We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes $n$ and number of regressors $p$ in the 100,000s, that is, two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples $n$, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present.

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

Statistical inference for model parameters in stochastic gradient descent

Spectral and matrix factorization methods for consistent community detection in multi-layer networks

New &#36;G&#36;-formula for the sequential causal effect and blip effect of treatment in sequential causal inference

Rerandomization in &#36;2^{K}&#36; factorial experiments

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression

Two-step semiparametric empirical likelihood inference

Detecting relevant changes in the mean of nonstationary processes—A mass excess approach

Bootstrapping and sample splitting for high-dimensional, assumption-lean inference

Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors

On testing for high-dimensional white noise

A smeary central limit theorem for manifolds with application to high-dimensional spheres

Hypothesis testing on linear structures of high-dimensional covariance matrix

Quantile regression under memory constraint

Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models

Randomized incomplete &#36;U&#36;-statistics in high dimensions

Active ranking from pairwise comparisons and when parametric assumptions do not help

Testing for independence of large dimensional vectors

Projected spline estimation of the nonparametric function in high-dimensional partially linear models for massive data

Test for high-dimensional correlation matrices

Eigenvalue distributions of variance components estimators in high-dimensional random effects models

A unified treatment of multiple testing with prior knowledge using the p-filter

Distance multivariance: New dependence measures for random vectors

An operator theoretic approach to nonparametric mixture models

Linear hypothesis testing for high dimensional generalized linear models

Semiparametrically point-optimal hybrid rank tests for unit roots

Doubly penalized estimation in additive regression with high-dimensional data

Semi-supervised inference: General theory and estimation of means

A knockoff filter for high-dimensional selective inference

Property testing in high-dimensional Ising models

Isotonic regression in general dimensions

The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics

On testing conditional qualitative treatment effects

On deep learning as a remedy for the curse of dimensionality in nonparametric regression

Negative association, ordering and convergence of resampling methods

Spectral method and regularized MLE are both optimal for top-&#36;K&#36; ranking

Generalized cluster trees and singular measures

Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem

metadata

Correction: Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects

Bayesian mixed effects models for zero-inflated compositions in microbiome data analysis

Estimating causal effects in studies of human brain function: New models, methods and estimands

A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies

Measuring human activity spaces from GPS data with density ranking and summary curves

Estimating the health effects of environmental mixtures using Bayesian semiparametric regression and sparsity inducing priors

Bayesian factor models for probabilistic cause of death assessment with verbal autopsies

Surface temperature monitoring in liver procurement via functional variance change-point analysis

Efficient real-time monitoring of an emerging influenza pandemic: How feasible?

Integrative survival analysis with uncertain event times in application to a suicide risk study

SHOPPER: A probabilistic model of consumer choice with substitutes and complements

The finish line: Attachment of Signs

The Finish Line: Adhesives vs. Mechanical Fasteners

The Finish Line: A (Faux) Monument for the Ages

Meeting Codes with Wall Assemblies

Green Advocacy vs. Informed Consent

Passive Houses Gain Momentum

American Industrial Partners to Acquire PPG’s Architectural Coatings Business

NCS Trust ‘sad and disappointed’ at government plans to shut it down

Fundraising Regulator appoints four new committee members

Veterans’ care charity to merge into larger counterpart

INTERVIEW: The Payback of a Green Investment

“Commitment to the Environment”

Securitas Technology Partners with K9s United in Support of Law Enforcement Canines

Incomplete information can fuel misjudgment: study

The Future of Morning Meals

Subscribe To Our Newsletter

New $G$-formula for the sequential causal effect and blip effect of treatment in sequential causal inference

Rerandomization in $2^{K}$ factorial experiments

Randomized incomplete $U$-statistics in high dimensions

Spectral method and regularized MLE are both optimal for top-$K$ ranking