an

Asymptotic genealogies of interacting particle systems with an application to sequential Monte Carlo

Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spanò.

Source: The Annals of Statistics, Volume 48, Number 1, 560--583.

Abstract:
We study weighted particle systems in which new generations are resampled from current particles with probabilities proportional to their weights. This covers a broad class of sequential Monte Carlo (SMC) methods, widely-used in applied statistics and cognate disciplines. We consider the genealogical tree embedded into such particle systems, and identify conditions, as well as an appropriate time-scaling, under which they converge to the Kingman $n$-coalescent in the infinite system size limit in the sense of finite-dimensional distributions. Thus, the tractable $n$-coalescent can be used to predict the shape and size of SMC genealogies, as we illustrate by characterising the limiting mean and variance of the tree height. SMC genealogies are known to be connected to algorithm performance, so that our results are likely to have applications in the design of new methods as well. Our conditions for convergence are strong, but we show by simulation that they do not appear to be necessary.




an

Averages of unlabeled networks: Geometric characterization and asymptotic behavior

Eric D. Kolaczyk, Lizhen Lin, Steven Rosenberg, Jackson Walters, Jie Xu.

Source: The Annals of Statistics, Volume 48, Number 1, 514--538.

Abstract:
It is becoming increasingly common to see large collections of network data objects, that is, data sets in which a network is viewed as a fundamental unit of observation. As a result, there is a pressing need to develop network-based analogues of even many of the most basic tools already standard for scalar and vector data. In this paper, our focus is on averages of unlabeled, undirected networks with edge weights. Specifically, we (i) characterize a certain notion of the space of all such networks, (ii) describe key topological and geometric properties of this space relevant to doing probability and statistics thereupon, and (iii) use these properties to establish the asymptotic behavior of a generalized notion of an empirical mean under sampling from a distribution supported on this space. Our results rely on a combination of tools from geometry, probability theory and statistical shape analysis. In particular, the lack of vertex labeling necessitates working with a quotient space modding out permutations of labels. This results in a nontrivial geometry for the space of unlabeled networks, which in turn is found to have important implications on the types of probabilistic and statistical results that may be obtained and the techniques needed to obtain them.




an

Optimal prediction in the linearly transformed spiked model

Edgar Dobriban, William Leeb, Amit Singer.

Source: The Annals of Statistics, Volume 48, Number 1, 491--513.

Abstract:
We consider the linearly transformed spiked model , where the observations $Y_{i}$ are noisy linear transforms of unobserved signals of interest $X_{i}$: egin{equation*}Y_{i}=A_{i}X_{i}+varepsilon_{i},end{equation*} for $i=1,ldots ,n$. The transform matrices $A_{i}$ are also observed. We model the unobserved signals (or regression coefficients) $X_{i}$ as vectors lying on an unknown low-dimensional space. Given only $Y_{i}$ and $A_{i}$ how should we predict or recover their values? The naive approach of performing regression for each observation separately is inaccurate due to the large noise level. Instead, we develop optimal methods for predicting $X_{i}$ by “borrowing strength” across the different samples. Our linear empirical Bayes methods scale to large datasets and rely on weak moment assumptions. We show that this model has wide-ranging applications in signal processing, deconvolution, cryo-electron microscopy, and missing data with noise. For missing data, we show in simulations that our methods are more robust to noise and to unequal sampling than well-known matrix completion methods.




an

Consistent selection of the number of change-points via sample-splitting

Changliang Zou, Guanghui Wang, Runze Li.

Source: The Annals of Statistics, Volume 48, Number 1, 413--439.

Abstract:
In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.




an

Concentration and consistency results for canonical and curved exponential-family models of random graphs

Michael Schweinberger, Jonathan Stewart.

Source: The Annals of Statistics, Volume 48, Number 1, 374--396.

Abstract:
Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and $M$-estimators of a wide range of canonical and curved exponential-family models of random graphs with local dependence. All results are nonasymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes.




an

The multi-armed bandit problem: An efficient nonparametric solution

Hock Peng Chan.

Source: The Annals of Statistics, Volume 48, Number 1, 346--373.

Abstract:
Lai and Robbins ( Adv. in Appl. Math. 6 (1985) 4–22) and Lai ( Ann. Statist. 15 (1987) 1091–1114) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback–Leibler information of the reward distributions, estimated from specified parametric families. In recent years, there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Nonparametric arm allocation procedures like $epsilon $-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under nonparametric settings. However, unlike UCB these nonparametric procedures are not efficient under general parametric settings. In this paper, we propose efficient nonparametric procedures.




an

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

Dimitris Bertsimas, Bart Van Parys.

Source: The Annals of Statistics, Volume 48, Number 1, 300--323.

Abstract:
We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes $n$ and number of regressors $p$ in the 100,000s, that is, two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples $n$, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present.




an

Bootstrap confidence regions based on M-estimators under nonstandard conditions

Stephen M. S. Lee, Puyudi Yang.

Source: The Annals of Statistics, Volume 48, Number 1, 274--299.

Abstract:
Suppose that a confidence region is desired for a subvector $ heta $ of a multidimensional parameter $xi =( heta ,psi )$, based on an M-estimator $hat{xi }_{n}=(hat{ heta }_{n},hat{psi }_{n})$ calculated from a random sample of size $n$. Under nonstandard conditions $hat{xi }_{n}$ often converges at a nonregular rate $r_{n}$, in which case consistent estimation of the distribution of $r_{n}(hat{ heta }_{n}- heta )$, a pivot commonly chosen for confidence region construction, is most conveniently effected by the $m$ out of $n$ bootstrap. The above choice of pivot has three drawbacks: (i) the shape of the region is either subjectively prescribed or controlled by a computationally intensive depth function; (ii) the region is not transformation equivariant; (iii) $hat{xi }_{n}$ may not be uniquely defined. To resolve the above difficulties, we propose a one-dimensional pivot derived from the criterion function, and prove that its distribution can be consistently estimated by the $m$ out of $n$ bootstrap, or by a modified version of the perturbation bootstrap. This leads to a new method for constructing confidence regions which are transformation equivariant and have shapes driven solely by the criterion function. A subsampling procedure is proposed for selecting $m$ in practice. Empirical performance of the new method is illustrated with examples drawn from different nonstandard M-estimation settings. Extension of our theory to row-wise independent triangular arrays is also explored.




an

Spectral and matrix factorization methods for consistent community detection in multi-layer networks

Subhadeep Paul, Yuguo Chen.

Source: The Annals of Statistics, Volume 48, Number 1, 230--250.

Abstract:
We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these “intermediate fusion” methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objectives will return good community structures. We investigate the consistency properties of the global optimizer of some of these objective functions under the multi-layer stochastic blockmodel. For this purpose, we derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that the intermediate fusion techniques outperform late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix in sparse networks, while they outperform the spectral clustering of mean adjacency matrix in multi-layer networks that contain layers with both homophilic and heterophilic communities.




an

Adaptive risk bounds in univariate total variation denoising and trend filtering

Adityanand Guntuboyina, Donovan Lieu, Sabyasachi Chatterjee, Bodhisattva Sen.

Source: The Annals of Statistics, Volume 48, Number 1, 205--229.

Abstract:
We study trend filtering, a relatively recent method for univariate nonparametric regression. For a given integer $rgeq1$, the $r$th order trend filtering estimator is defined as the minimizer of the sum of squared errors when we constrain (or penalize) the sum of the absolute $r$th order discrete derivatives of the fitted function at the design points. For $r=1$, the estimator reduces to total variation regularization which has received much attention in the statistics and image processing literature. In this paper, we study the performance of the trend filtering estimator for every $rgeq1$, both in the constrained and penalized forms. Our main results show that in the strong sparsity setting when the underlying function is a (discrete) spline with few “knots,” the risk (under the global squared error loss) of the trend filtering estimator (with an appropriate choice of the tuning parameter) achieves the parametric $n^{-1}$-rate, up to a logarithmic (multiplicative) factor. Our results therefore provide support for the use of trend filtering, for every $rgeq1$, in the strong sparsity setting.




an

New $G$-formula for the sequential causal effect and blip effect of treatment in sequential causal inference

Xiaoqin Wang, Li Yin.

Source: The Annals of Statistics, Volume 48, Number 1, 138--160.

Abstract:
In sequential causal inference, two types of causal effects are of practical interest, namely, the causal effect of the treatment regime (called the sequential causal effect) and the blip effect of treatment on the potential outcome after the last treatment. The well-known $G$-formula expresses these causal effects in terms of the standard parameters. In this article, we obtain a new $G$-formula that expresses these causal effects in terms of the point observable effects of treatments similar to treatment in the framework of single-point causal inference. Based on the new $G$-formula, we estimate these causal effects by maximum likelihood via point observable effects with methods extended from single-point causal inference. We are able to increase precision of the estimation without introducing biases by an unsaturated model imposing constraints on the point observable effects. We are also able to reduce the number of point observable effects in the estimation by treatment assignment conditions.




an

Model assisted variable clustering: Minimax-optimal recovery and algorithms

Florentina Bunea, Christophe Giraud, Xi Luo, Martin Royer, Nicolas Verzelen.

Source: The Annals of Statistics, Volume 48, Number 1, 111--137.

Abstract:
The problem of variable clustering is that of estimating groups of similar components of a $p$-dimensional vector $X=(X_{1},ldots ,X_{p})$ from $n$ independent copies of $X$. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of $G$-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a $G$-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to $G$-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular $K$-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we compare our methods with another popular clustering method, spectral clustering. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.




an

Robust sparse covariance estimation by thresholding Tyler’s M-estimator

John Goes, Gilad Lerman, Boaz Nadler.

Source: The Annals of Statistics, Volume 48, Number 1, 86--110.

Abstract:
Estimating a high-dimensional sparse covariance matrix from a limited number of samples is a fundamental task in contemporary data analysis. Most proposals to date, however, are not robust to outliers or heavy tails. Toward bridging this gap, in this work we consider estimating a sparse shape matrix from $n$ samples following a possibly heavy-tailed elliptical distribution. We propose estimators based on thresholding either Tyler’s M-estimator or its regularized variant. We prove that in the joint limit as the dimension $p$ and the sample size $n$ tend to infinity with $p/n ogamma>0$, our estimators are minimax rate optimal. Results on simulated data support our theoretical analysis.




an

Sparse SIR: Optimal rates and adaptive estimation

Kai Tan, Lei Shi, Zhou Yu.

Source: The Annals of Statistics, Volume 48, Number 1, 64--85.

Abstract:
Sliced inverse regression (SIR) is an innovative and effective method for sufficient dimension reduction and data visualization. Recently, an impressive range of penalized SIR methods has been proposed to estimate the central subspace in a sparse fashion. Nonetheless, few of them considered the sparse sufficient dimension reduction from a decision-theoretic point of view. To address this issue, we in this paper establish the minimax rates of convergence for estimating the sparse SIR directions under various commonly used loss functions in the literature of sufficient dimension reduction. We also discover the possible trade-off between statistical guarantee and computational performance for sparse SIR. We finally propose an adaptive estimation scheme for sparse SIR which is computationally tractable and rate optimal. Numerical studies are carried out to confirm the theoretical properties of our proposed methods.




an

Rerandomization in $2^{K}$ factorial experiments

Xinran Li, Peng Ding, Donald B. Rubin.

Source: The Annals of Statistics, Volume 48, Number 1, 43--63.

Abstract:
With many pretreatment covariates and treatment factors, the classical factorial experiment often fails to balance covariates across multiple factorial effects simultaneously. Therefore, it is intuitive to restrict the randomization of the treatment factors to satisfy certain covariate balance criteria, possibly conforming to the tiers of factorial effects and covariates based on their relative importances. This is rerandomization in factorial experiments. We study the asymptotic properties of this experimental design under the randomization inference framework without imposing any distributional or modeling assumptions of the covariates and outcomes. We derive the joint asymptotic sampling distribution of the usual estimators of the factorial effects, and show that it is symmetric, unimodal and more “concentrated” at the true factorial effects under rerandomization than under the classical factorial experiment. We quantify this advantage of rerandomization using the notions of “central convex unimodality” and “peakedness” of the joint asymptotic sampling distribution. We also construct conservative large-sample confidence sets for the factorial effects.




an

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression

Emmanuel J. Candès, Pragya Sur.

Source: The Annals of Statistics, Volume 48, Number 1, 27--42.

Abstract:
This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp “phase transition.” We introduce an explicit boundary curve $h_{mathrm{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n ightarrow kappa $, we show that if the problem is sufficiently high dimensional in the sense that $kappa >h_{mathrm{MLE}}$, then the MLE does not exist with probability one. Conversely, if $kappa <h_{mathrm{MLE}}$, the MLE asymptotically exists with probability one.




an

Detecting relevant changes in the mean of nonstationary processes—A mass excess approach

Holger Dette, Weichi Wu.

Source: The Annals of Statistics, Volume 47, Number 6, 3578--3608.

Abstract:
This paper considers the problem of testing if a sequence of means $(mu_{t})_{t=1,ldots ,n}$ of a nonstationary time series $(X_{t})_{t=1,ldots ,n}$ is stable in the sense that the difference of the means $mu_{1}$ and $mu_{t}$ between the initial time $t=1$ and any other time is smaller than a given threshold, that is $|mu_{1}-mu_{t}|leq c$ for all $t=1,ldots ,n$. A test for hypotheses of this type is developed using a bias corrected monotone rearranged local linear estimator and asymptotic normality of the corresponding test statistic is established. As the asymptotic variance depends on the location of the roots of the equation $|mu_{1}-mu_{t}|=c$ a new bootstrap procedure is proposed to obtain critical values and its consistency is established. As a consequence we are able to quantitatively describe relevant deviations of a nonstationary sequence from its initial value. The results are illustrated by means of a simulation study and by analyzing data examples.




an

Intrinsic Riemannian functional data analysis

Zhenhua Lin, Fang Yao.

Source: The Annals of Statistics, Volume 47, Number 6, 3533--3577.

Abstract:
In this work we develop a novel and foundational framework for analyzing general Riemannian functional data, in particular a new development of tensor Hilbert spaces along curves on a manifold. Such spaces enable us to derive Karhunen–Loève expansion for Riemannian random processes. This framework also features an approach to compare objects from different tensor Hilbert spaces, which paves the way for asymptotic analysis in Riemannian functional data analysis. Built upon intrinsic geometric concepts such as vector field, Levi-Civita connection and parallel transport on Riemannian manifolds, the developed framework applies to not only Euclidean submanifolds but also manifolds without a natural ambient space. As applications of this framework, we develop intrinsic Riemannian functional principal component analysis (iRFPCA) and intrinsic Riemannian functional linear regression (iRFLR) that are distinct from their traditional and ambient counterparts. We also provide estimation procedures for iRFPCA and iRFLR, and investigate their asymptotic properties within the intrinsic geometry. Numerical performance is illustrated by simulated and real examples.




an

Joint convergence of sample autocovariance matrices when &#36;p/n o 0&#36; with application

Monika Bhattacharjee, Arup Bose.

Source: The Annals of Statistics, Volume 47, Number 6, 3470--3503.

Abstract:
Consider a high-dimensional linear time series model where the dimension $p$ and the sample size $n$ grow in such a way that $p/n o 0$. Let $hat{Gamma }_{u}$ be the $u$th order sample autocovariance matrix. We first show that the LSD of any symmetric polynomial in ${hat{Gamma }_{u},hat{Gamma }_{u}^{*},ugeq 0}$ exists under independence and moment assumptions on the driving sequence together with weak assumptions on the coefficient matrices. This LSD result, with some additional effort, implies the asymptotic normality of the trace of any polynomial in ${hat{Gamma }_{u},hat{Gamma }_{u}^{*},ugeq 0}$. We also study similar results for several independent MA processes. We show applications of the above results to statistical inference problems such as in estimation of the unknown order of a high-dimensional MA process and in graphical and significance tests for hypotheses on coefficient matrices of one or several such independent processes.




an

Bootstrapping and sample splitting for high-dimensional, assumption-lean inference

Alessandro Rinaldo, Larry Wasserman, Max G’Sell.

Source: The Annals of Statistics, Volume 47, Number 6, 3438--3469.

Abstract:
Several new methods have been recently proposed for performing valid inference after model selection. An older method is sample splitting: use part of the data for model selection and the rest for inference. In this paper, we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-lean approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. Finally, we elucidate an inference-prediction trade-off: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions.




an

Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors

Kyoungjae Lee, Jaeyong Lee, Lizhen Lin.

Source: The Annals of Statistics, Volume 47, Number 6, 3413--3437.

Abstract:
In this paper we study the high-dimensional sparse directed acyclic graph (DAG) models under the empirical sparse Cholesky prior. Among our results, strong model selection consistency or graph selection consistency is obtained under more general conditions than those in the existing literature. Compared to Cao, Khare and Ghosh [ Ann. Statist. (2019) 47 319–348], the required conditions are weakened in terms of the dimensionality, sparsity and lower bound of the nonzero elements in the Cholesky factor. Furthermore, our result does not require the irrepresentable condition, which is necessary for Lasso-type methods. We also derive the posterior convergence rates for precision matrices and Cholesky factors with respect to various matrix norms. The obtained posterior convergence rates are the fastest among those of the existing Bayesian approaches. In particular, we prove that our posterior convergence rates for Cholesky factors are the minimax or at least nearly minimax depending on the relative size of true sparseness for the entire dimension. The simulation study confirms that the proposed method outperforms the competing methods.




an

A smeary central limit theorem for manifolds with application to high-dimensional spheres

Benjamin Eltzner, Stephan F. Huckemann.

Source: The Annals of Statistics, Volume 47, Number 6, 3360--3381.

Abstract:
The (CLT) central limit theorems for generalized Fréchet means (data descriptors assuming values in manifolds, such as intrinsic means, geodesics, etc.) on manifolds from the literature are only valid if a certain empirical process of Hessians of the Fréchet function converges suitably, as in the proof of the prototypical BP-CLT [ Ann. Statist. 33 (2005) 1225–1259]. This is not valid in many realistic scenarios and we provide for a new very general CLT. In particular, this includes scenarios where, in a suitable chart, the sample mean fluctuates asymptotically at a scale $n^{alpha }$ with exponents $alpha <1/2$ with a nonnormal distribution. As the BP-CLT yields only fluctuations that are, rescaled with $n^{1/2}$, asymptotically normal, just as the classical CLT for random vectors, these lower rates, somewhat loosely called smeariness, had to date been observed only on the circle. We make the concept of smeariness on manifolds precise, give an example for two-smeariness on spheres of arbitrary dimension, and show that smeariness, although “almost never” occurring, may have serious statistical implications on a continuum of sample scenarios nearby. In fact, this effect increases with dimension, striking in particular in high dimension low sample size scenarios.




an

Hypothesis testing on linear structures of high-dimensional covariance matrix

Shurong Zheng, Zhao Chen, Hengjian Cui, Runze Li.

Source: The Annals of Statistics, Volume 47, Number 6, 3300--3334.

Abstract:
This paper is concerned with test of significance on high-dimensional covariance structures, and aims to develop a unified framework for testing commonly used linear covariance structures. We first construct a consistent estimator for parameters involved in the linear covariance structure, and then develop two tests for the linear covariance structures based on entropy loss and quadratic loss used for covariance matrix estimation. To study the asymptotic properties of the proposed tests, we study related high-dimensional random matrix theory, and establish several highly useful asymptotic results. With the aid of these asymptotic results, we derive the limiting distributions of these two tests under the null and alternative hypotheses. We further show that the quadratic loss based test is asymptotically unbiased. We conduct Monte Carlo simulation study to examine the finite sample performance of the two tests. Our simulation results show that the limiting null distributions approximate their null distributions quite well, and the corresponding asymptotic critical values keep Type I error rate very well. Our numerical comparison implies that the proposed tests outperform existing ones in terms of controlling Type I error rate and power. Our simulation indicates that the test based on quadratic loss seems to have better power than the test based on entropy loss.




an

Sampling and estimation for (sparse) exchangeable graphs

Victor Veitch, Daniel M. Roy.

Source: The Annals of Statistics, Volume 47, Number 6, 3274--3299.

Abstract:
Sparse exchangeable graphs on $mathbb{R}_{+}$, and the associated graphex framework for sparse graphs, generalize exchangeable graphs on $mathbb{N}$, and the associated graphon framework for dense graphs. We develop the graphex framework as a tool for statistical network analysis by identifying the sampling scheme that is naturally associated with the models of the framework, formalizing two natural notions of consistent estimation of the parameter (the graphex) underlying these models, and identifying general consistent estimators in each case. The sampling scheme is a modification of independent vertex sampling that throws away vertices that are isolated in the sampled subgraph. The estimators are variants of the empirical graphon estimator, which is known to be a consistent estimator for the distribution of dense exchangeable graphs; both can be understood as graph analogues to the empirical distribution in the i.i.d. sequence setting. Our results may be viewed as a generalization of consistent estimation via the empirical graphon from the dense graph regime to also include sparse graphs.




an

Quantile regression under memory constraint

Xi Chen, Weidong Liu, Yichen Zhang.

Source: The Annals of Statistics, Volume 47, Number 6, 3244--3273.

Abstract:
This paper studies the inference problem in quantile regression (QR) for a large sample size $n$ but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A natural method is the naive divide-and-conquer approach, which splits data into batches of size $m$, computes the local QR estimator for each batch and then aggregates the estimators via averaging. However, this method only works when $n=o(m^{2})$ and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality $p$ goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.




an

Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models

Xin Bing, Marten H. Wegkamp.

Source: The Annals of Statistics, Volume 47, Number 6, 3157--3184.

Abstract:
We consider the multivariate response regression problem with a regression coefficient matrix of low, unknown rank. In this setting, we analyze a new criterion for selecting the optimal reduced rank. This criterion differs notably from the one proposed in Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in that it does not require estimation of the unknown variance of the noise, nor does it depend on a delicate choice of a tuning parameter. We develop an iterative, fully data-driven procedure, that adapts to the optimal signal-to-noise ratio. This procedure finds the true rank in a few steps with overwhelming probability. At each step, our estimate increases, while at the same time it does not exceed the true rank. Our finite sample results hold for any sample size and any dimension, even when the number of responses and of covariates grow much faster than the number of observations. We perform an extensive simulation study that confirms our theoretical findings. The new method performs better and is more stable than the procedure of Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in both low- and high-dimensional settings.




an

Randomized incomplete &#36;U&#36;-statistics in high dimensions

Xiaohui Chen, Kengo Kato.

Source: The Annals of Statistics, Volume 47, Number 6, 3127--3156.

Abstract:
This paper studies inference for the mean vector of a high-dimensional $U$-statistic. In the era of big data, the dimension $d$ of the $U$-statistic and the sample size $n$ of the observations tend to be both large, and the computation of the $U$-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for $U$-statistics is even more computationally expensive. To overcome such a computational bottleneck, incomplete $U$-statistics obtained by sampling fewer terms of the $U$-statistic are attractive alternatives. In this paper, we introduce randomized incomplete $U$-statistics with sparse weights whose computational cost can be made independent of the order of the $U$-statistic. We derive nonasymptotic Gaussian approximation error bounds for the randomized incomplete $U$-statistics in high dimensions, namely in cases where the dimension $d$ is possibly much larger than the sample size $n$, for both nondegenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete $U$-statistics that are computationally much less demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature.




an

Active ranking from pairwise comparisons and when parametric assumptions do not help

Reinhard Heckel, Nihar B. Shah, Kannan Ramchandran, Martin J. Wainwright.

Source: The Annals of Statistics, Volume 47, Number 6, 3099--3126.

Abstract:
We consider sequential or active ranking of a set of $n$ items based on noisy pairwise comparisons. Items are ranked according to the probability that a given item beats a randomly chosen item, and ranking refers to partitioning the items into sets of prespecified sizes according to their scores. This notion of ranking includes as special cases the identification of the top-$k$ items and the total ordering of the items. We first analyze a sequential ranking algorithm that counts the number of comparisons won, and uses these counts to decide whether to stop, or to compare another pair of items, chosen based on confidence intervals specified by the data collected up to that point. We prove that this algorithm succeeds in recovering the ranking using a number of comparisons that is optimal up to logarithmic factors. This guarantee does depend on whether or not the underlying pairwise probability matrix, satisfies a particular structural property, unlike a significant body of past work on pairwise ranking based on parametric models such as the Thurstone or Bradley–Terry–Luce models. It has been a long-standing open question as to whether or not imposing these parametric assumptions allows for improved ranking algorithms. For stochastic comparison models, in which the pairwise probabilities are bounded away from zero, our second contribution is to resolve this issue by proving a lower bound for parametric models. This shows, perhaps surprisingly, that these popular parametric modeling choices offer at most logarithmic gains for stochastic comparisons.




an

Eigenvalue distributions of variance components estimators in high-dimensional random effects models

Zhou Fan, Iain M. Johnstone.

Source: The Annals of Statistics, Volume 47, Number 5, 2855--2886.

Abstract:
We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure. Our proof uses operator-valued free probability theory, and we establish a general asymptotic freeness result for families of rectangular orthogonally invariant random matrices, which is of independent interest. Our work is motivated in part by the estimation of components of covariance between multiple phenotypic traits in quantitative genetics, and we specialize our results to common experimental designs that arise in this application.




an

Distance multivariance: New dependence measures for random vectors

Björn Böttcher, Martin Keller-Ressel, René L. Schilling.

Source: The Annals of Statistics, Volume 47, Number 5, 2757--2789.

Abstract:
We introduce two new measures for the dependence of $nge2$ random variables: distance multivariance and total distance multivariance . Both measures are based on the weighted $L^{2}$-distance of quantities related to the characteristic functions of the underlying random variables. These extend distance covariance (introduced by Székely, Rizzo and Bakirov) from pairs of random variables to $n$-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of $n$ random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Under some mild moment conditions, this leads to a test for independence of multiple random vectors which is consistent against all alternatives.




an

Phase transition in the spiked random tensor with Rademacher prior

Wei-Kuo Chen.

Source: The Annals of Statistics, Volume 47, Number 5, 2734--2756.

Abstract:
We consider the problem of detecting a deformation from a symmetric Gaussian random $p$-tensor $(pgeq3)$ with a rank-one spike sampled from the Rademacher prior. Recently, in Lesieur et al. (Barbier, Krzakala, Macris, Miolane and Zdeborová (2017)), it was proved that there exists a critical threshold $eta_{p}$ so that when the signal-to-noise ratio exceeds $eta_{p}$, one can distinguish the spiked and unspiked tensors and weakly recover the prior via the minimal mean-square-error method. On the other side, Perry, Wein and Bandeira (Perry, Wein and Bandeira (2017)) proved that there exists a $eta_{p}'<eta_{p}$ such that any statistical hypothesis test cannot distinguish these two tensors, in the sense that their total variation distance asymptotically vanishes, when the signa-to-noise ratio is less than $eta_{p}'$. In this work, we show that $eta_{p}$ is indeed the critical threshold that strictly separates the distinguishability and indistinguishability between the two tensors under the total variation distance. Our approach is based on a subtle analysis of the high temperature behavior of the pure $p$-spin model with Ising spin, arising initially from the field of spin glasses. In particular, we identify the signal-to-noise criticality $eta_{p}$ as the critical temperature, distinguishing the high and low temperature behavior, of the Ising pure $p$-spin mean-field spin glass model.




an

An operator theoretic approach to nonparametric mixture models

Robert A. Vandermeulen, Clayton D. Scott.

Source: The Annals of Statistics, Volume 47, Number 5, 2704--2733.

Abstract:
When estimating finite mixture models, it is common to make assumptions on the mixture components, such as parametric assumptions. In this work, we make no distributional assumptions on the mixture components and instead assume that observations from the mixture model are grouped, such that observations in the same group are known to be drawn from the same mixture component. We precisely characterize the number of observations $n$ per group needed for the mixture model to be identifiable, as a function of the number $m$ of mixture components. In addition to our assumption-free analysis, we also study the settings where the mixture components are either linearly independent or jointly irreducible. Furthermore, our analysis considers two kinds of identifiability, where the mixture model is the simplest one explaining the data, and where it is the only one. As an application of these results, we precisely characterize identifiability of multinomial mixture models. Our analysis relies on an operator-theoretic framework that associates mixture models in the grouped-sample setting with certain infinite-dimensional tensors. Based on this framework, we introduce a general spectral algorithm for recovering the mixture components.




an

Semiparametrically point-optimal hybrid rank tests for unit roots

Bo Zhou, Ramon van den Akker, Bas J. M. Werker.

Source: The Annals of Statistics, Volume 47, Number 5, 2601--2638.

Abstract:
We propose a new class of unit root tests that exploits invariance properties in the Locally Asymptotically Brownian Functional limit experiment associated to the unit root model. The invariance structures naturally suggest tests that are based on the ranks of the increments of the observations, their average and an assumed reference density for the innovations. The tests are semiparametric in the sense that they are valid, that is, have the correct (asymptotic) size, irrespective of the true innovation density. For a correctly specified reference density, our test is point-optimal and nearly efficient. For arbitrary reference densities, we establish a Chernoff–Savage-type result, that is, our test performs as well as commonly used tests under Gaussian innovations but has improved power under other, for example, fat-tailed or skewed, innovation distributions. To avoid nonparametric estimation, we propose a simplified version of our test that exhibits the same asymptotic properties, except for the Chernoff–Savage result that we are only able to demonstrate by means of simulations.




an

Semi-supervised inference: General theory and estimation of means

Anru Zhang, Lawrence D. Brown, T. Tony Cai.

Source: The Annals of Statistics, Volume 47, Number 5, 2538--2566.

Abstract:
We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses (“labels”). Otherwise, the formulation is “assumption-lean” in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and $ell_{2}$-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population.




an

The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics

Joshua Cape, Minh Tang, Carey E. Priebe.

Source: The Annals of Statistics, Volume 47, Number 5, 2405--2439.

Abstract:
The singular value matrix decomposition plays a ubiquitous role throughout statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and exploiting the geometric structure of singular values and singular vectors. This paper provides a novel collection of technical and theoretical tools for studying the geometry of singular subspaces using the two-to-infinity norm. Motivated by preliminary deterministic Procrustes analysis, we consider a general matrix perturbation setting in which we derive a new Procrustean matrix decomposition. Together with flexible machinery developed for the two-to-infinity norm, this allows us to conduct a refined analysis of the induced perturbation geometry with respect to the underlying singular vectors even in the presence of singular value multiplicity. Our analysis yields singular vector entrywise perturbation bounds for a range of popular matrix noise models, each of which has a meaningful associated statistical inference task. In addition, we demonstrate how the two-to-infinity norm is the preferred norm in certain statistical settings. Specific applications discussed in this paper include covariance estimation, singular subspace recovery, and multiple graph inference. Both our Procrustean matrix decomposition and the technical machinery developed for the two-to-infinity norm may be of independent interest.




an

Dynamic network models and graphon estimation

Marianna Pensky.

Source: The Annals of Statistics, Volume 47, Number 4, 2378--2403.

Abstract:
In the present paper, we consider a dynamic stochastic network model. The objective is estimation of the tensor of connection probabilities $mathbf{{Lambda}}$ when it is generated by a Dynamic Stochastic Block Model (DSBM) or a dynamic graphon. In particular, in the context of the DSBM, we derive a penalized least squares estimator $widehat{oldsymbol{Lambda}}$ of $mathbf{{Lambda}}$ and show that $widehat{oldsymbol{Lambda}}$ satisfies an oracle inequality and also attains minimax lower bounds for the risk. We extend those results to estimation of $mathbf{{Lambda}}$ when it is generated by a dynamic graphon function. The estimators constructed in the paper are adaptive to the unknown number of blocks in the context of the DSBM or to the smoothness of the graphon function. The technique relies on the vectorization of the model and leads to much simpler mathematical arguments than the ones used previously in the stationary set up. In addition, all results in the paper are nonasymptotic and allow a variety of extensions.




an

Convergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regression

Qian Qin, James P. Hobert.

Source: The Annals of Statistics, Volume 47, Number 4, 2320--2347.

Abstract:
The use of MCMC algorithms in high dimensional Bayesian problems has become routine. This has spurred so-called convergence complexity analysis, the goal of which is to ascertain how the convergence rate of a Monte Carlo Markov chain scales with sample size, $n$, and/or number of covariates, $p$. This article provides a thorough convergence complexity analysis of Albert and Chib’s [ J. Amer. Statist. Assoc. 88 (1993) 669–679] data augmentation algorithm for the Bayesian probit regression model. The main tools used in this analysis are drift and minorization conditions. The usual pitfalls associated with this type of analysis are avoided by utilizing centered drift functions, which are minimized in high posterior probability regions, and by using a new technique to suppress high-dimensionality in the construction of minorization conditions. The main result is that the geometric convergence rate of the underlying Markov chain is bounded below 1 both as $n ightarrowinfty$ (with $p$ fixed), and as $p ightarrowinfty$ (with $n$ fixed). Furthermore, the first computable bounds on the total variation distance to stationarity are byproducts of the asymptotic analysis.




an

Negative association, ordering and convergence of resampling methods

Mathieu Gerber, Nicolas Chopin, Nick Whiteley.

Source: The Annals of Statistics, Volume 47, Number 4, 2236--2260.

Abstract:
We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost sure weak convergence of measures output from Kitagawa’s [ J. Comput. Graph. Statist. 5 (1996) 1–25] stratified resampling method. Carpenter, Ckiffird and Fearnhead’s [ IEE Proc. Radar Sonar Navig. 146 (1999) 2–7] systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of [In 42nd IEEE Symposium on Foundations of Computer Science ( Las Vegas , NV , 2001) (2001) 588–597 IEEE Computer Soc.], which shares some attractive properties of systematic resampling, but which exhibits negative association and, therefore, converges irrespective of the order of the input samples. We confirm a conjecture made by [ J. Comput. Graph. Statist. 5 (1996) 1–25] that ordering input samples by their states in $mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $mathbb{R}^{d}$, the variance of the resampling error is ${scriptstylemathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.




an

Spectral method and regularized MLE are both optimal for top-&#36;K&#36; ranking

Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang.

Source: The Annals of Statistics, Volume 47, Number 4, 2204--2235.

Abstract:
This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley–Terry–Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress toward characterizing the performance (e.g., the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity—the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and noniterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis–Kahan $mathop{mathrm{sin}} olimits Theta $ theorem for symmetric matrices. This also allows us to close the gap between the $ell_{2}$ error upper bound for the spectral method and the minimax lower limit.




an

Generalized cluster trees and singular measures

Yen-Chi Chen.

Source: The Annals of Statistics, Volume 47, Number 4, 2174--2203.

Abstract:
In this paper we study the $alpha $-cluster tree ($alpha $-tree) under both singular and nonsingular measures. The $alpha $-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an $alpha $-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated $alpha $-tree fails to converge in the $L_{infty }$ metric but is still consistent under the integrated distance. We also observe a new type of critical points—the dimensional critical points (DCPs)—of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated $alpha $-tree.




an

Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem

James G. Scott, James O. Berger

Source: Ann. Statist., Volume 38, Number 5, 2587--2619.

Abstract:
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.




an

Liberty Alliance

Digital identity standards group. Set up at the instigation of Sun Microsystems in 2001, the Liberty Alliance Project is a consortium of technology vendors and consumer-facing enterprises formed "to establish an open standard for federated network identity." It aims to make it easier for consumers to access networked services from multiple suppliers while safeguarding security and privacy. Its specifications have been published in three phases: the Identity Federation Framework (ID-FF) came first; the Identity Web Services Framework (ID-WSF) followed in November 2003; and work is in progress on the Identity Services Interface Specifications (ID-SIS). Liberty Alliance specifications are closely linked to the SAML single sign-on standard, and overlap with elements of WS-Security.




an

granularity

How small the pieces are. When a system is split into components, it's important to get the right degree of componentization. Small, fine-grained components give much greater flexibility in assembling precisely the right combination of functionality, but they are more difficult to co-ordinate. Much larger, coarse-grained components are easier to manage but may become too unwieldy. Performance and management considerations tend to favor the use of more coarsely grained messages in a service oriented architecture, whereas earlier generations of distributed computing have preferred a much finer level of granularity.




an

governance

How an organization controls its actions. Governance describes the mechanisms an organization uses to ensure that its constituents follow its established processes and policies. It is the primary means of maintaining oversight and accountability in a loosely coupled organizational structure. A proper governance strategy implements systems to monitor and record what is going on, takes steps to ensure compliance with agreed policies, and provides for corrective action in cases where the rules have been ignored or misconstrued.




an

semantics

Intended meaning. In computing, semantics is the assumed or explicit set of understandings used in a system to give meaning to data. One of the biggest challenges when integrating separate computer systems and applications is to correctly match up the intended meanings within each system. Simple metadata classifications such as 'price' or 'location' may have wildly different meanings in each system, while apparently different terms, such as 'client' and 'patient' may turn out to be effectively equivalent.




an

Correction: Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects

Trang Quynh Nguyen, Elizabeth A. Stuart.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 518--520.