science and technology Bootstrapping and sample splitting for high-dimensional, assumption-lean inference By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Alessandro Rinaldo, Larry Wasserman, Max G’Sell. Source: The Annals of Statistics, Volume 47, Number 6, 3438--3469.Abstract: Several new methods have been recently proposed for performing valid inference after model selection. An older method is sample splitting: use part of the data for model selection and the rest for inference. In this paper, we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-lean approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. Finally, we elucidate an inference-prediction trade-off: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions. Full Article
science and technology Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Kyoungjae Lee, Jaeyong Lee, Lizhen Lin. Source: The Annals of Statistics, Volume 47, Number 6, 3413--3437.Abstract: In this paper we study the high-dimensional sparse directed acyclic graph (DAG) models under the empirical sparse Cholesky prior. Among our results, strong model selection consistency or graph selection consistency is obtained under more general conditions than those in the existing literature. Compared to Cao, Khare and Ghosh [ Ann. Statist. (2019) 47 319–348], the required conditions are weakened in terms of the dimensionality, sparsity and lower bound of the nonzero elements in the Cholesky factor. Furthermore, our result does not require the irrepresentable condition, which is necessary for Lasso-type methods. We also derive the posterior convergence rates for precision matrices and Cholesky factors with respect to various matrix norms. The obtained posterior convergence rates are the fastest among those of the existing Bayesian approaches. In particular, we prove that our posterior convergence rates for Cholesky factors are the minimax or at least nearly minimax depending on the relative size of true sparseness for the entire dimension. The simulation study confirms that the proposed method outperforms the competing methods. Full Article
science and technology On testing for high-dimensional white noise By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Zeng Li, Clifford Lam, Jianfeng Yao, Qiwei Yao. Source: The Annals of Statistics, Volume 47, Number 6, 3382--3412.Abstract: Testing for white noise is a classical yet important problem in statistics, especially for diagnostic checks in time series modeling and linear regression. For high-dimensional time series in the sense that the dimension $p$ is large in relation to the sample size $T$, the popular omnibus tests including the multivariate Hosking and Li–McLeod tests are extremely conservative, leading to substantial power loss. To develop more relevant tests for high-dimensional cases, we propose a portmanteau-type test statistic which is the sum of squared singular values of the first $q$ lagged sample autocovariance matrices. It, therefore, encapsulates all the serial correlations (up to the time lag $q$) within and across all component series. Using the tools from random matrix theory and assuming both $p$ and $T$ diverge to infinity, we derive the asymptotic normality of the test statistic under both the null and a specific VMA(1) alternative hypothesis. As the actual implementation of the test requires the knowledge of three characteristic constants of the population cross-sectional covariance matrix and the value of the fourth moment of the standardized innovations, nontrivial estimations are proposed for these parameters and their integration leads to a practically usable test. Extensive simulation confirms the excellent finite-sample performance of the new test with accurate size and satisfactory power for a large range of finite $(p,T)$ combinations, therefore, ensuring wide applicability in practice. In particular, the new tests are consistently superior to the traditional Hosking and Li–McLeod tests. Full Article
science and technology A smeary central limit theorem for manifolds with application to high-dimensional spheres By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Benjamin Eltzner, Stephan F. Huckemann. Source: The Annals of Statistics, Volume 47, Number 6, 3360--3381.Abstract: The (CLT) central limit theorems for generalized Fréchet means (data descriptors assuming values in manifolds, such as intrinsic means, geodesics, etc.) on manifolds from the literature are only valid if a certain empirical process of Hessians of the Fréchet function converges suitably, as in the proof of the prototypical BP-CLT [ Ann. Statist. 33 (2005) 1225–1259]. This is not valid in many realistic scenarios and we provide for a new very general CLT. In particular, this includes scenarios where, in a suitable chart, the sample mean fluctuates asymptotically at a scale $n^{alpha }$ with exponents $alpha <1/2$ with a nonnormal distribution. As the BP-CLT yields only fluctuations that are, rescaled with $n^{1/2}$, asymptotically normal, just as the classical CLT for random vectors, these lower rates, somewhat loosely called smeariness, had to date been observed only on the circle. We make the concept of smeariness on manifolds precise, give an example for two-smeariness on spheres of arbitrary dimension, and show that smeariness, although “almost never” occurring, may have serious statistical implications on a continuum of sample scenarios nearby. In fact, this effect increases with dimension, striking in particular in high dimension low sample size scenarios. Full Article
science and technology On optimal designs for nonregular models By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Yi Lin, Ryan Martin, Min Yang. Source: The Annals of Statistics, Volume 47, Number 6, 3335--3359.Abstract: Classically, Fisher information is the relevant object in defining optimal experimental designs. However, for models that lack certain regularity, the Fisher information does not exist, and hence, there is no notion of design optimality available in the literature. This article seeks to fill the gap by proposing a so-called Hellinger information , which generalizes Fisher information in the sense that the two measures agree in regular problems, but the former also exists for certain types of nonregular problems. We derive a Hellinger information inequality, showing that Hellinger information defines a lower bound on the local minimax risk of estimators. This provides a connection between features of the underlying model—in particular, the design—and the performance of estimators, motivating the use of this new Hellinger information for nonregular optimal design problems. Hellinger optimal designs are derived for several nonregular regression problems, with numerical results empirically demonstrating the efficiency of these designs compared to alternatives. Full Article
science and technology Hypothesis testing on linear structures of high-dimensional covariance matrix By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Shurong Zheng, Zhao Chen, Hengjian Cui, Runze Li. Source: The Annals of Statistics, Volume 47, Number 6, 3300--3334.Abstract: This paper is concerned with test of significance on high-dimensional covariance structures, and aims to develop a unified framework for testing commonly used linear covariance structures. We first construct a consistent estimator for parameters involved in the linear covariance structure, and then develop two tests for the linear covariance structures based on entropy loss and quadratic loss used for covariance matrix estimation. To study the asymptotic properties of the proposed tests, we study related high-dimensional random matrix theory, and establish several highly useful asymptotic results. With the aid of these asymptotic results, we derive the limiting distributions of these two tests under the null and alternative hypotheses. We further show that the quadratic loss based test is asymptotically unbiased. We conduct Monte Carlo simulation study to examine the finite sample performance of the two tests. Our simulation results show that the limiting null distributions approximate their null distributions quite well, and the corresponding asymptotic critical values keep Type I error rate very well. Our numerical comparison implies that the proposed tests outperform existing ones in terms of controlling Type I error rate and power. Our simulation indicates that the test based on quadratic loss seems to have better power than the test based on entropy loss. Full Article
science and technology Sampling and estimation for (sparse) exchangeable graphs By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Victor Veitch, Daniel M. Roy. Source: The Annals of Statistics, Volume 47, Number 6, 3274--3299.Abstract: Sparse exchangeable graphs on $mathbb{R}_{+}$, and the associated graphex framework for sparse graphs, generalize exchangeable graphs on $mathbb{N}$, and the associated graphon framework for dense graphs. We develop the graphex framework as a tool for statistical network analysis by identifying the sampling scheme that is naturally associated with the models of the framework, formalizing two natural notions of consistent estimation of the parameter (the graphex) underlying these models, and identifying general consistent estimators in each case. The sampling scheme is a modification of independent vertex sampling that throws away vertices that are isolated in the sampled subgraph. The estimators are variants of the empirical graphon estimator, which is known to be a consistent estimator for the distribution of dense exchangeable graphs; both can be understood as graph analogues to the empirical distribution in the i.i.d. sequence setting. Our results may be viewed as a generalization of consistent estimation via the empirical graphon from the dense graph regime to also include sparse graphs. Full Article
science and technology Quantile regression under memory constraint By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Xi Chen, Weidong Liu, Yichen Zhang. Source: The Annals of Statistics, Volume 47, Number 6, 3244--3273.Abstract: This paper studies the inference problem in quantile regression (QR) for a large sample size $n$ but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A natural method is the naive divide-and-conquer approach, which splits data into batches of size $m$, computes the local QR estimator for each batch and then aggregates the estimators via averaging. However, this method only works when $n=o(m^{2})$ and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality $p$ goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data. Full Article
science and technology On partial-sum processes of ARMAX residuals By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Steffen Grønneberg, Benjamin Holcblat. Source: The Annals of Statistics, Volume 47, Number 6, 3216--3243.Abstract: We establish general and versatile results regarding the limit behavior of the partial-sum process of ARMAX residuals. Illustrations include ARMA with seasonal dummies, misspecified ARMAX models with autocorrelated errors, nonlinear ARMAX models, ARMA with a structural break, a wide range of ARMAX models with infinite-variance errors, weak GARCH models and the consistency of kernel estimation of the density of ARMAX errors. Our results identify the limit distributions, and provide a general algorithm to obtain pivot statistics for CUSUM tests. Full Article
science and technology Statistical inference for autoregressive models under heteroscedasticity of unknown form By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Ke Zhu. Source: The Annals of Statistics, Volume 47, Number 6, 3185--3215.Abstract: This paper provides an entire inference procedure for the autoregressive model under (conditional) heteroscedasticity of unknown form with a finite variance. We first establish the asymptotic normality of the weighted least absolute deviations estimator (LADE) for the model. Second, we develop the random weighting (RW) method to estimate its asymptotic covariance matrix, leading to the implementation of the Wald test. Third, we construct a portmanteau test for model checking, and use the RW method to obtain its critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE) is proposed and proved to have the same efficiency as its infeasible counterpart. The importance of our entire methodology based on the feasible ALADE is illustrated by simulation results and the real data analysis on three U.S. economic data sets. Full Article
science and technology Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Xin Bing, Marten H. Wegkamp. Source: The Annals of Statistics, Volume 47, Number 6, 3157--3184.Abstract: We consider the multivariate response regression problem with a regression coefficient matrix of low, unknown rank. In this setting, we analyze a new criterion for selecting the optimal reduced rank. This criterion differs notably from the one proposed in Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in that it does not require estimation of the unknown variance of the noise, nor does it depend on a delicate choice of a tuning parameter. We develop an iterative, fully data-driven procedure, that adapts to the optimal signal-to-noise ratio. This procedure finds the true rank in a few steps with overwhelming probability. At each step, our estimate increases, while at the same time it does not exceed the true rank. Our finite sample results hold for any sample size and any dimension, even when the number of responses and of covariates grow much faster than the number of observations. We perform an extensive simulation study that confirms our theoretical findings. The new method performs better and is more stable than the procedure of Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in both low- and high-dimensional settings. Full Article
science and technology Randomized incomplete $U$-statistics in high dimensions By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Xiaohui Chen, Kengo Kato. Source: The Annals of Statistics, Volume 47, Number 6, 3127--3156.Abstract: This paper studies inference for the mean vector of a high-dimensional $U$-statistic. In the era of big data, the dimension $d$ of the $U$-statistic and the sample size $n$ of the observations tend to be both large, and the computation of the $U$-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for $U$-statistics is even more computationally expensive. To overcome such a computational bottleneck, incomplete $U$-statistics obtained by sampling fewer terms of the $U$-statistic are attractive alternatives. In this paper, we introduce randomized incomplete $U$-statistics with sparse weights whose computational cost can be made independent of the order of the $U$-statistic. We derive nonasymptotic Gaussian approximation error bounds for the randomized incomplete $U$-statistics in high dimensions, namely in cases where the dimension $d$ is possibly much larger than the sample size $n$, for both nondegenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete $U$-statistics that are computationally much less demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature. Full Article
science and technology Active ranking from pairwise comparisons and when parametric assumptions do not help By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Reinhard Heckel, Nihar B. Shah, Kannan Ramchandran, Martin J. Wainwright. Source: The Annals of Statistics, Volume 47, Number 6, 3099--3126.Abstract: We consider sequential or active ranking of a set of $n$ items based on noisy pairwise comparisons. Items are ranked according to the probability that a given item beats a randomly chosen item, and ranking refers to partitioning the items into sets of prespecified sizes according to their scores. This notion of ranking includes as special cases the identification of the top-$k$ items and the total ordering of the items. We first analyze a sequential ranking algorithm that counts the number of comparisons won, and uses these counts to decide whether to stop, or to compare another pair of items, chosen based on confidence intervals specified by the data collected up to that point. We prove that this algorithm succeeds in recovering the ranking using a number of comparisons that is optimal up to logarithmic factors. This guarantee does depend on whether or not the underlying pairwise probability matrix, satisfies a particular structural property, unlike a significant body of past work on pairwise ranking based on parametric models such as the Thurstone or Bradley–Terry–Luce models. It has been a long-standing open question as to whether or not imposing these parametric assumptions allows for improved ranking algorithms. For stochastic comparison models, in which the pairwise probabilities are bounded away from zero, our second contribution is to resolve this issue by proving a lower bound for parametric models. This shows, perhaps surprisingly, that these popular parametric modeling choices offer at most logarithmic gains for stochastic comparisons. Full Article
science and technology Sorted concave penalized regression By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Long Feng, Cun-Hui Zhang. Source: The Annals of Statistics, Volume 47, Number 6, 3069--3098.Abstract: The Lasso is biased. Concave penalized least squares estimation (PLSE) takes advantage of signal strength to reduce this bias, leading to sharper error bounds in prediction, coefficient estimation and variable selection. For prediction and estimation, the bias of the Lasso can be also reduced by taking a smaller penalty level than what selection consistency requires, but such smaller penalty level depends on the sparsity of the true coefficient vector. The sorted $ell_{1}$ penalized estimation (Slope) was proposed for adaptation to such smaller penalty levels. However, the advantages of concave PLSE and Slope do not subsume each other. We propose sorted concave penalized estimation to combine the advantages of concave and sorted penalizations. We prove that sorted concave penalties adaptively choose the smaller penalty level and at the same time benefits from signal strength, especially when a significant proportion of signals are stronger than the corresponding adaptively selected penalty levels. A local convex approximation for sorted concave penalties, which extends the local linear and quadratic approximations for separable concave penalties, is developed to facilitate the computation of sorted concave PLSE and proven to possess desired prediction and estimation error bounds. Our analysis of prediction and estimation errors requires the restricted eigenvalue condition on the design, not beyond, and provides selection consistency under a required minimum signal strength condition in addition. Thus, our results also sharpens existing results on concave PLSE by removing the upper sparse eigenvalue component of the sparse Riesz condition. Full Article
science and technology Additive models with trend filtering By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Veeranjaneyulu Sadhanala, Ryan J. Tibshirani. Source: The Annals of Statistics, Volume 47, Number 6, 3032--3068.Abstract: We study additive models built with trend filtering, that is, additive models whose components are each regularized by the (discrete) total variation of their $k$th (discrete) derivative, for a chosen integer $kgeq0$. This results in $k$th degree piecewise polynomial components, (e.g., $k=0$ gives piecewise constant components, $k=1$ gives piecewise linear, $k=2$ gives piecewise quadratic, etc.). Analogous to its advantages in the univariate case, additive trend filtering has favorable theoretical and computational properties, thanks in large part to the localized nature of the (discrete) total variation regularizer that it uses. On the theory side, we derive fast error rates for additive trend filtering estimates, and show these rates are minimax optimal when the underlying function is additive and has component functions whose derivatives are of bounded variation. We also show that these rates are unattainable by additive smoothing splines (and by additive models built from linear smoothers, in general). On the computational side, we use backfitting, to leverage fast univariate trend filtering solvers; we also describe a new backfitting algorithm whose iterations can be run in parallel, which (as far as we can tell) is the first of its kind. Lastly, we present a number of experiments to examine the empirical performance of trend filtering. Full Article
science and technology Distributed estimation of principal eigenspaces By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Jianqing Fan, Dong Wang, Kaizheng Wang, Ziwei Zhu. Source: The Annals of Statistics, Volume 47, Number 6, 3009--3031.Abstract: Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. This paper proposes and studies a distributed PCA algorithm: each node machine computes the top $K$ eigenvectors and transmits them to the central server; the central server then aggregates the information from all the node machines and conducts a PCA based on the aggregated information. We investigate the bias and variance for the resulting distributed estimator of the top $K$ eigenvectors. In particular, we show that for distributions with symmetric innovation, the empirical top eigenspaces are unbiased, and hence the distributed PCA is “unbiased.” We derive the rate of convergence for distributed PCA estimators, which depends explicitly on the effective rank of covariance, eigengap, and the number of machines. We show that when the number of machines is not unreasonably large, the distributed PCA performs as well as the whole sample PCA, even without full access of whole data. The theoretical results are verified by an extensive simulation study. We also extend our analysis to the heterogeneous case where the population covariance matrices are different across local machines but share similar top eigenstructures. Full Article
science and technology Testing for independence of large dimensional vectors By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Taras Bodnar, Holger Dette, Nestor Parolya. Source: The Annals of Statistics, Volume 47, Number 5, 2977--3008.Abstract: In this paper, new tests for the independence of two high-dimensional vectors are investigated. We consider the case where the dimension of the vectors increases with the sample size and propose multivariate analysis of variance-type statistics for the hypothesis of a block diagonal covariance matrix. The asymptotic properties of the new test statistics are investigated under the null hypothesis and the alternative hypothesis using random matrix theory. For this purpose, we study the weak convergence of linear spectral statistics of central and (conditionally) noncentral Fisher matrices. In particular, a central limit theorem for linear spectral statistics of large dimensional (conditionally) noncentral Fisher matrices is derived which is then used to analyse the power of the tests under the alternative. The theoretical results are illustrated by means of a simulation study where we also compare the new tests with several alternative, in particular with the commonly used corrected likelihood ratio test. It is demonstrated that the latter test does not keep its nominal level, if the dimension of one sub-vector is relatively small compared to the dimension of the other sub-vector. On the other hand, the tests proposed in this paper provide a reasonable approximation of the nominal level in such situations. Moreover, we observe that one of the proposed tests is most powerful under a variety of correlation scenarios. Full Article
science and technology Inference for the mode of a log-concave density By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Charles R. Doss, Jon A. Wellner. Source: The Annals of Statistics, Volume 47, Number 5, 2950--2976.Abstract: We study a likelihood ratio test for the location of the mode of a log-concave density. Our test is based on comparison of the log-likelihoods corresponding to the unconstrained maximum likelihood estimator of a log-concave density and the constrained maximum likelihood estimator where the constraint is that the mode of the density is fixed, say at $m$. The constrained estimation problem is studied in detail in Doss and Wellner (2018). Here, the results of that paper are used to show that, under the null hypothesis (and strict curvature of $-log f$ at the mode), the likelihood ratio statistic is asymptotically pivotal: that is, it converges in distribution to a limiting distribution which is free of nuisance parameters, thus playing the role of the $chi_{1}^{2}$ distribution in classical parametric statistical problems. By inverting this family of tests, we obtain new (likelihood ratio based) confidence intervals for the mode of a log-concave density $f$. These new intervals do not depend on any smoothing parameters. We study the new confidence intervals via Monte Carlo methods and illustrate them with two real data sets. The new intervals seem to have several advantages over existing procedures. Software implementing the test and confidence intervals is available in the R package verb+logcondens.mode+. Full Article
science and technology Projected spline estimation of the nonparametric function in high-dimensional partially linear models for massive data By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Heng Lian, Kaifeng Zhao, Shaogao Lv. Source: The Annals of Statistics, Volume 47, Number 5, 2922--2949.Abstract: In this paper, we consider the local asymptotics of the nonparametric function in a partially linear model, within the framework of the divide-and-conquer estimation. Unlike the fixed-dimensional setting in which the parametric part does not affect the nonparametric part, the high-dimensional setting makes the issue more complicated. In particular, when a sparsity-inducing penalty such as lasso is used to make the estimation of the linear part feasible, the bias introduced will propagate to the nonparametric part. We propose a novel approach for estimation of the nonparametric function and establish the local asymptotics of the estimator. The result is useful for massive data with possibly different linear coefficients in each subpopulation but common nonparametric function. Some numerical illustrations are also presented. Full Article
science and technology Test for high-dimensional correlation matrices By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Shurong Zheng, Guanghui Cheng, Jianhua Guo, Hongtu Zhu. Source: The Annals of Statistics, Volume 47, Number 5, 2887--2921.Abstract: Testing correlation structures has attracted extensive attention in the literature due to both its importance in real applications and several major theoretical challenges. The aim of this paper is to develop a general framework of testing correlation structures for the one , two and multiple sample testing problems under a high-dimensional setting when both the sample size and data dimension go to infinity. Our test statistics are designed to deal with both the dense and sparse alternatives. We systematically investigate the asymptotic null distribution, power function and unbiasedness of each test statistic. Theoretically, we make great efforts to deal with the nonindependency of all random matrices of the sample correlation matrices. We use simulation studies and real data analysis to illustrate the versatility and practicability of our test statistics. Full Article
science and technology Eigenvalue distributions of variance components estimators in high-dimensional random effects models By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Zhou Fan, Iain M. Johnstone. Source: The Annals of Statistics, Volume 47, Number 5, 2855--2886.Abstract: We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure. Our proof uses operator-valued free probability theory, and we establish a general asymptotic freeness result for families of rectangular orthogonally invariant random matrices, which is of independent interest. Our work is motivated in part by the estimation of components of covariance between multiple phenotypic traits in quantitative genetics, and we specialize our results to common experimental designs that arise in this application. Full Article
science and technology Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Aryeh Kontorovich, Iosif Pinelis. Source: The Annals of Statistics, Volume 47, Number 5, 2822--2854.Abstract: We provide an exact nonasymptotic lower bound on the minimax expected excess risk (EER) in the agnostic probably-approximately-correct (PAC) machine learning classification model and identify minimax learning algorithms as certain maximally symmetric and minimally randomized “voting” procedures. Based on this result, an exact asymptotic lower bound on the minimax EER is provided. This bound is of the simple form $c_{infty}/sqrt{ u}$ as $ u oinfty$, where $c_{infty}=0.16997dots$ is a universal constant, $ u=m/d$, $m$ is the size of the training sample and $d$ is the Vapnik–Chervonenkis dimension of the hypothesis class. It is shown that the differences between these asymptotic and nonasymptotic bounds, as well as the differences between these two bounds and the maximum EER of any learning algorithms that minimize the empirical risk, are asymptotically negligible, and all these differences are due to ties in the mentioned “voting” procedures. A few easy to compute nonasymptotic lower bounds on the minimax EER are also obtained, which are shown to be close to the exact asymptotic lower bound $c_{infty}/sqrt{ u}$ even for rather small values of the ratio $ u=m/d$. As an application of these results, we substantially improve existing lower bounds on the tail probability of the excess risk. Among the tools used are Bayes estimation and apparently new identities and inequalities for binomial distributions. Full Article
science and technology A unified treatment of multiple testing with prior knowledge using the p-filter By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Aaditya K. Ramdas, Rina F. Barber, Martin J. Wainwright, Michael I. Jordan. Source: The Annals of Statistics, Volume 47, Number 5, 2790--2821.Abstract: There is a significant literature on methods for incorporating knowledge into multiple testing procedures so as to improve their power and precision. Some common forms of prior knowledge include (a) beliefs about which hypotheses are null, modeled by nonuniform prior weights; (b) differing importances of hypotheses, modeled by differing penalties for false discoveries; (c) multiple arbitrary partitions of the hypotheses into (possibly overlapping) groups and (d) knowledge of independence, positive or arbitrary dependence between hypotheses or groups, suggesting the use of more aggressive or conservative procedures. We present a unified algorithmic framework called p-filter for global null testing and false discovery rate (FDR) control that allows the scientist to incorporate all four types of prior knowledge (a)–(d) simultaneously, recovering a variety of known algorithms as special cases. Full Article
science and technology Distance multivariance: New dependence measures for random vectors By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Björn Böttcher, Martin Keller-Ressel, René L. Schilling. Source: The Annals of Statistics, Volume 47, Number 5, 2757--2789.Abstract: We introduce two new measures for the dependence of $nge2$ random variables: distance multivariance and total distance multivariance . Both measures are based on the weighted $L^{2}$-distance of quantities related to the characteristic functions of the underlying random variables. These extend distance covariance (introduced by Székely, Rizzo and Bakirov) from pairs of random variables to $n$-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of $n$ random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Under some mild moment conditions, this leads to a test for independence of multiple random vectors which is consistent against all alternatives. Full Article
science and technology Phase transition in the spiked random tensor with Rademacher prior By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Wei-Kuo Chen. Source: The Annals of Statistics, Volume 47, Number 5, 2734--2756.Abstract: We consider the problem of detecting a deformation from a symmetric Gaussian random $p$-tensor $(pgeq3)$ with a rank-one spike sampled from the Rademacher prior. Recently, in Lesieur et al. (Barbier, Krzakala, Macris, Miolane and Zdeborová (2017)), it was proved that there exists a critical threshold $eta_{p}$ so that when the signal-to-noise ratio exceeds $eta_{p}$, one can distinguish the spiked and unspiked tensors and weakly recover the prior via the minimal mean-square-error method. On the other side, Perry, Wein and Bandeira (Perry, Wein and Bandeira (2017)) proved that there exists a $eta_{p}'<eta_{p}$ such that any statistical hypothesis test cannot distinguish these two tensors, in the sense that their total variation distance asymptotically vanishes, when the signa-to-noise ratio is less than $eta_{p}'$. In this work, we show that $eta_{p}$ is indeed the critical threshold that strictly separates the distinguishability and indistinguishability between the two tensors under the total variation distance. Our approach is based on a subtle analysis of the high temperature behavior of the pure $p$-spin model with Ising spin, arising initially from the field of spin glasses. In particular, we identify the signal-to-noise criticality $eta_{p}$ as the critical temperature, distinguishing the high and low temperature behavior, of the Ising pure $p$-spin mean-field spin glass model. Full Article
science and technology An operator theoretic approach to nonparametric mixture models By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Robert A. Vandermeulen, Clayton D. Scott. Source: The Annals of Statistics, Volume 47, Number 5, 2704--2733.Abstract: When estimating finite mixture models, it is common to make assumptions on the mixture components, such as parametric assumptions. In this work, we make no distributional assumptions on the mixture components and instead assume that observations from the mixture model are grouped, such that observations in the same group are known to be drawn from the same mixture component. We precisely characterize the number of observations $n$ per group needed for the mixture model to be identifiable, as a function of the number $m$ of mixture components. In addition to our assumption-free analysis, we also study the settings where the mixture components are either linearly independent or jointly irreducible. Furthermore, our analysis considers two kinds of identifiability, where the mixture model is the simplest one explaining the data, and where it is the only one. As an application of these results, we precisely characterize identifiability of multinomial mixture models. Our analysis relies on an operator-theoretic framework that associates mixture models in the grouped-sample setting with certain infinite-dimensional tensors. Based on this framework, we introduce a general spectral algorithm for recovering the mixture components. Full Article
science and technology Linear hypothesis testing for high dimensional generalized linear models By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Chengchun Shi, Rui Song, Zhao Chen, Runze Li. Source: The Annals of Statistics, Volume 47, Number 5, 2671--2703.Abstract: This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are $chi^{2}$ distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral $chi^{2}$ distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to $infty$ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures. Full Article
science and technology The middle-scale asymptotics of Wishart matrices By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Didier Chételat, Martin T. Wells. Source: The Annals of Statistics, Volume 47, Number 5, 2639--2670.Abstract: We study the behavior of a real $p$-dimensional Wishart random matrix with $n$ degrees of freedom when $n,p ightarrowinfty$ but $p/n ightarrow0$. We establish the existence of phase transitions when $p$ grows at the order $n^{(K+1)/(K+3)}$ for every $Kinmathbb{N}$, and derive expressions for approximating densities between every two phase transitions. To do this, we make use of a novel tool we call the $mathcal{F}$-conjugate of an absolutely continuous distribution, which is obtained from the Fourier transform of the square root of its density. In the case of the normalized Wishart distribution, this represents an extension of the $t$-distribution to the space of real symmetric matrices. Full Article
science and technology Semiparametrically point-optimal hybrid rank tests for unit roots By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Bo Zhou, Ramon van den Akker, Bas J. M. Werker. Source: The Annals of Statistics, Volume 47, Number 5, 2601--2638.Abstract: We propose a new class of unit root tests that exploits invariance properties in the Locally Asymptotically Brownian Functional limit experiment associated to the unit root model. The invariance structures naturally suggest tests that are based on the ranks of the increments of the observations, their average and an assumed reference density for the innovations. The tests are semiparametric in the sense that they are valid, that is, have the correct (asymptotic) size, irrespective of the true innovation density. For a correctly specified reference density, our test is point-optimal and nearly efficient. For arbitrary reference densities, we establish a Chernoff–Savage-type result, that is, our test performs as well as commonly used tests under Gaussian innovations but has improved power under other, for example, fat-tailed or skewed, innovation distributions. To avoid nonparametric estimation, we propose a simplified version of our test that exhibits the same asymptotic properties, except for the Chernoff–Savage result that we are only able to demonstrate by means of simulations. Full Article
science and technology Doubly penalized estimation in additive regression with high-dimensional data By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Zhiqiang Tan, Cun-Hui Zhang. Source: The Annals of Statistics, Volume 47, Number 5, 2567--2600.Abstract: Additive regression provides an extension of linear regression by modeling the signal of a response as a sum of functions of covariates of relatively low complexity. We study penalized estimation in high-dimensional nonparametric additive regression where functional semi-norms are used to induce smoothness of component functions and the empirical $L_{2}$ norm is used to induce sparsity. The functional semi-norms can be of Sobolev or bounded variation types and are allowed to be different amongst individual component functions. We establish oracle inequalities for the predictive performance of such methods under three simple technical conditions: a sub-Gaussian condition on the noise, a compatibility condition on the design and the functional classes under consideration and an entropy condition on the functional classes. For random designs, the sample compatibility condition can be replaced by its population version under an additional condition to ensure suitable convergence of empirical norms. In homogeneous settings where the complexities of the component functions are of the same order, our results provide a spectrum of minimax convergence rates, from the so-called slow rate without requiring the compatibility condition to the fast rate under the hard sparsity or certain $L_{q}$ sparsity to allow many small components in the true regression function. These results significantly broaden and sharpen existing ones in the literature. Full Article
science and technology Semi-supervised inference: General theory and estimation of means By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Anru Zhang, Lawrence D. Brown, T. Tony Cai. Source: The Annals of Statistics, Volume 47, Number 5, 2538--2566.Abstract: We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses (“labels”). Otherwise, the formulation is “assumption-lean” in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and $ell_{2}$-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population. Full Article
science and technology A knockoff filter for high-dimensional selective inference By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Rina Foygel Barber, Emmanuel J. Candès. Source: The Annals of Statistics, Volume 47, Number 5, 2504--2537.Abstract: This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy—a fake variable serving as a control—for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure “discovers” important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is nonasymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype. Full Article
science and technology Property testing in high-dimensional Ising models By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Matey Neykov, Han Liu. Source: The Annals of Statistics, Volume 47, Number 5, 2472--2503.Abstract: This paper explores the information-theoretic limitations of graph property testing in zero-field Ising models. Instead of learning the entire graph structure, sometimes testing a basic graph property such as connectivity, cycle presence or maximum clique size is a more relevant and attainable objective. Since property testing is more fundamental than graph recovery, any necessary conditions for property testing imply corresponding conditions for graph recovery, while custom property tests can be statistically and/or computationally more efficient than graph recovery based algorithms. Understanding the statistical complexity of property testing requires the distinction of ferromagnetic (i.e., positive interactions only) and general Ising models. Using combinatorial constructs such as graph packing and strong monotonicity, we characterize how target properties affect the corresponding minimax upper and lower bounds within the realm of ferromagnets. On the other hand, by studying the detection of an antiferromagnetic (i.e., negative interactions only) Curie–Weiss model buried in Rademacher noise, we show that property testing is strictly more challenging over general Ising models. In terms of methodological development, we propose two types of correlation based tests: computationally efficient screening for ferromagnets, and score type tests for general models, including a fast cycle presence test. Our correlation screening tests match the information-theoretic bounds for property testing in ferromagnets in certain regimes. Full Article
science and technology Isotonic regression in general dimensions By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Qiyang Han, Tengyao Wang, Sabyasachi Chatterjee, Richard J. Samworth. Source: The Annals of Statistics, Volume 47, Number 5, 2440--2471.Abstract: We study the least squares regression function estimator over the class of real-valued functions on $[0,1]^{d}$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order $n^{-min{2/(d+2),1/d}}$ in the empirical $L_{2}$ loss, up to polylogarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on $k$ hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of $(k/n)^{min(1,2/d)}$, again up to polylogarithmic factors. Previous results are confined to the case $dleq2$. Finally, we establish corresponding bounds (which are new even in the case $d=2$) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to polylogarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate. Full Article
science and technology The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Joshua Cape, Minh Tang, Carey E. Priebe. Source: The Annals of Statistics, Volume 47, Number 5, 2405--2439.Abstract: The singular value matrix decomposition plays a ubiquitous role throughout statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and exploiting the geometric structure of singular values and singular vectors. This paper provides a novel collection of technical and theoretical tools for studying the geometry of singular subspaces using the two-to-infinity norm. Motivated by preliminary deterministic Procrustes analysis, we consider a general matrix perturbation setting in which we derive a new Procrustean matrix decomposition. Together with flexible machinery developed for the two-to-infinity norm, this allows us to conduct a refined analysis of the induced perturbation geometry with respect to the underlying singular vectors even in the presence of singular value multiplicity. Our analysis yields singular vector entrywise perturbation bounds for a range of popular matrix noise models, each of which has a meaningful associated statistical inference task. In addition, we demonstrate how the two-to-infinity norm is the preferred norm in certain statistical settings. Specific applications discussed in this paper include covariance estimation, singular subspace recovery, and multiple graph inference. Both our Procrustean matrix decomposition and the technical machinery developed for the two-to-infinity norm may be of independent interest. Full Article
science and technology Cross validation for locally stationary processes By projecteuclid.org Published On :: Wed, 22 May 2019 04:01 EDT Stefan Richter, Rainer Dahlhaus. Source: The Annals of Statistics, Volume 47, Number 4, 2145--2173.Abstract: We propose an adaptive bandwidth selector via cross validation for local M-estimators in locally stationary processes. We prove asymptotic optimality of the procedure under mild conditions on the underlying parameter curves. The results are applicable to a wide range of locally stationary processes such linear and nonlinear processes. A simulation study shows that the method works fairly well also in misspecified situations. Full Article
science and technology Dynamic network models and graphon estimation By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Marianna Pensky. Source: The Annals of Statistics, Volume 47, Number 4, 2378--2403.Abstract: In the present paper, we consider a dynamic stochastic network model. The objective is estimation of the tensor of connection probabilities $mathbf{{Lambda}}$ when it is generated by a Dynamic Stochastic Block Model (DSBM) or a dynamic graphon. In particular, in the context of the DSBM, we derive a penalized least squares estimator $widehat{oldsymbol{Lambda}}$ of $mathbf{{Lambda}}$ and show that $widehat{oldsymbol{Lambda}}$ satisfies an oracle inequality and also attains minimax lower bounds for the risk. We extend those results to estimation of $mathbf{{Lambda}}$ when it is generated by a dynamic graphon function. The estimators constructed in the paper are adaptive to the unknown number of blocks in the context of the DSBM or to the smoothness of the graphon function. The technique relies on the vectorization of the model and leads to much simpler mathematical arguments than the ones used previously in the stationary set up. In addition, all results in the paper are nonasymptotic and allow a variety of extensions. Full Article
science and technology On testing conditional qualitative treatment effects By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Chengchun Shi, Rui Song, Wenbin Lu. Source: The Annals of Statistics, Volume 47, Number 4, 2348--2377.Abstract: Precision medicine is an emerging medical paradigm that focuses on finding the most effective treatment strategy tailored for individual patients. In the literature, most of the existing works focused on estimating the optimal treatment regime. However, there has been less attention devoted to hypothesis testing regarding the optimal treatment regime. In this paper, we first introduce the notion of conditional qualitative treatment effects (CQTE) of a set of variables given another set of variables and provide a class of equivalent representations for the null hypothesis of no CQTE. The proposed definition of CQTE does not assume any parametric form for the optimal treatment rule and plays an important role for assessing the incremental value of a set of new variables in optimal treatment decision making conditional on an existing set of prescriptive variables. We then propose novel testing procedures for no CQTE based on kernel estimation of the conditional contrast functions. We show that our test statistics have asymptotically correct size and nonnegligible power against some nonstandard local alternatives. The empirical performance of the proposed tests are evaluated by simulations and an application to an AIDS data set. Full Article
science and technology Convergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regression By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Qian Qin, James P. Hobert. Source: The Annals of Statistics, Volume 47, Number 4, 2320--2347.Abstract: The use of MCMC algorithms in high dimensional Bayesian problems has become routine. This has spurred so-called convergence complexity analysis, the goal of which is to ascertain how the convergence rate of a Monte Carlo Markov chain scales with sample size, $n$, and/or number of covariates, $p$. This article provides a thorough convergence complexity analysis of Albert and Chib’s [ J. Amer. Statist. Assoc. 88 (1993) 669–679] data augmentation algorithm for the Bayesian probit regression model. The main tools used in this analysis are drift and minorization conditions. The usual pitfalls associated with this type of analysis are avoided by utilizing centered drift functions, which are minimized in high posterior probability regions, and by using a new technique to suppress high-dimensionality in the construction of minorization conditions. The main result is that the geometric convergence rate of the underlying Markov chain is bounded below 1 both as $n ightarrowinfty$ (with $p$ fixed), and as $p ightarrowinfty$ (with $n$ fixed). Furthermore, the first computable bounds on the total variation distance to stationarity are byproducts of the asymptotic analysis. Full Article
science and technology Convergence rates of least squares regression estimators with heavy-tailed errors By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Qiyang Han, Jon A. Wellner. Source: The Annals of Statistics, Volume 47, Number 4, 2286--2319.Abstract: We study the performance of the least squares estimator (LSE) in a general nonparametric regression model, when the errors are independent of the covariates but may only have a $p$th moment ($pgeq1$). In such a heavy-tailed regression setting, we show that if the model satisfies a standard “entropy condition” with exponent $alphain(0,2)$, then the $L_{2}$ loss of the LSE converges at a rate [mathcal{O}_{mathbf{P}}igl(n^{-frac{1}{2+alpha}}vee n^{-frac{1}{2}+frac{1}{2p}}igr).] Such a rate cannot be improved under the entropy condition alone. This rate quantifies both some positive and negative aspects of the LSE in a heavy-tailed regression setting. On the positive side, as long as the errors have $pgeq1+2/alpha$ moments, the $L_{2}$ loss of the LSE converges at the same rate as if the errors are Gaussian. On the negative side, if $p<1+2/alpha$, there are (many) hard models at any entropy level $alpha$ for which the $L_{2}$ loss of the LSE converges at a strictly slower rate than other robust estimators. The validity of the above rate relies crucially on the independence of the covariates and the errors. In fact, the $L_{2}$ loss of the LSE can converge arbitrarily slowly when the independence fails. The key technical ingredient is a new multiplier inequality that gives sharp bounds for the “multiplier empirical process” associated with the LSE. We further give an application to the sparse linear regression model with heavy-tailed covariates and errors to demonstrate the scope of this new inequality. Full Article
science and technology On deep learning as a remedy for the curse of dimensionality in nonparametric regression By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Benedikt Bauer, Michael Kohler. Source: The Annals of Statistics, Volume 47, Number 4, 2261--2285.Abstract: Assuming that a smoothness condition and a suitable restriction on the structure of the regression function hold, it is shown that least squares estimates based on multilayer feedforward neural networks are able to circumvent the curse of dimensionality in nonparametric regression. The proof is based on new approximation results concerning multilayer feedforward neural networks with bounded weights and a bounded number of hidden neurons. The estimates are compared with various other approaches by using simulated data. Full Article
science and technology Negative association, ordering and convergence of resampling methods By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Mathieu Gerber, Nicolas Chopin, Nick Whiteley. Source: The Annals of Statistics, Volume 47, Number 4, 2236--2260.Abstract: We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost sure weak convergence of measures output from Kitagawa’s [ J. Comput. Graph. Statist. 5 (1996) 1–25] stratified resampling method. Carpenter, Ckiffird and Fearnhead’s [ IEE Proc. Radar Sonar Navig. 146 (1999) 2–7] systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of [In 42nd IEEE Symposium on Foundations of Computer Science ( Las Vegas , NV , 2001) (2001) 588–597 IEEE Computer Soc.], which shares some attractive properties of systematic resampling, but which exhibits negative association and, therefore, converges irrespective of the order of the input samples. We confirm a conjecture made by [ J. Comput. Graph. Statist. 5 (1996) 1–25] that ordering input samples by their states in $mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $mathbb{R}^{d}$, the variance of the resampling error is ${scriptstylemathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling. Full Article
science and technology Spectral method and regularized MLE are both optimal for top-$K$ ranking By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang. Source: The Annals of Statistics, Volume 47, Number 4, 2204--2235.Abstract: This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley–Terry–Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress toward characterizing the performance (e.g., the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity—the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and noniterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis–Kahan $mathop{mathrm{sin}} olimits Theta $ theorem for symmetric matrices. This also allows us to close the gap between the $ell_{2}$ error upper bound for the spectral method and the minimax lower limit. Full Article
science and technology Generalized cluster trees and singular measures By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Yen-Chi Chen. Source: The Annals of Statistics, Volume 47, Number 4, 2174--2203.Abstract: In this paper we study the $alpha $-cluster tree ($alpha $-tree) under both singular and nonsingular measures. The $alpha $-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an $alpha $-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated $alpha $-tree fails to converge in the $L_{infty }$ metric but is still consistent under the integrated distance. We also observe a new type of critical points—the dimensional critical points (DCPs)—of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated $alpha $-tree. Full Article
science and technology Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT James G. Scott, James O. BergerSource: Ann. Statist., Volume 38, Number 5, 2587--2619.Abstract: This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains. Full Article
science and technology ESB By looselycoupled.com Published On :: 2003-07-16T10:00:00-00:00 (Enterprise Service Bus) Universal integration backbone. An ESB acts as a shared messaging layer for connecting applications and other services throughout an enterprise computing infrastructure. It supplements its core asynchronous messaging backbone with intelligent transformation and routing to ensure messages are passed reliably. Services participate in the ESB using either web services messaging standards or the Java Message System (JMS). Originally defined by analysts at Gartner, ESB is increasingly seen as a core component in a service-oriented infrastructure. Full Article
science and technology XSLT By looselycoupled.com Published On :: 2003-07-16T10:00:00-00:00 (eXtensible Stylesheet Language Transformations) Reformats XML data. Endorsed as a standard by the W3C, XSLT is a language for transforming data extracted from an XML document into a new document. The destination format can be in simple text, HTML or XML. The most common use of XSLT is in conjunction with XML Path Language (XPath) in e-business integration, when its role is to transform XML data between the different XML structures used by each participant. XSLT is also widely used in website content management, to convert raw XML into HTML pages. Full Article
science and technology ASP By looselycoupled.com Published On :: 2003-07-22T12:00:00-00:00 (Application Service Provider) Hosted software contractor. An ASP operates software at its data center, which customers access online under a service contract. The first wave of ASPs were application outsourcers, who hosted software packages from established enterprise software vendors such as SAP, Peoplesoft and Oracle. They are being superceded by a larger group of 'web-native' or 'net-native' ASPs, who have developed their own software, often using open-source platforms such as Linux, Apache and Perl, specifically for delivery as a hosted service. Full Article
science and technology J2EE By looselycoupled.com Published On :: 2003-08-07T17:00:00-00:00 (Java 2 Platform Enterprise Edition) Java-based software infrastructure. A standardized set of infrastructure software components written in the 'run-anywhere' Java programming language. Although owned by Sun, J2EE has been defined in collaboration with other leading vendors including IBM, Oracle and BEA. Today, it forms the basis of many of the leading web systems software platforms, including IBM WebSphere, BEA WebLogic, Sun ONE and the open-source JBoss. J2EE version 1.4, in beta until late summer 2003, incorporates full support for web services standards and introduces a J2EE version of the Java Connector Architecture (JCA), which simplifies connections and messaging from external resources. Full Article
science and technology interoperability By looselycoupled.com Published On :: 2003-08-07T17:00:00-00:00 Ability to work with each other. In the loosely coupled environment of a service-oriented architecture, separate resources don't need to know the details of how they each work, but they need to have enough common ground to reliably exchange messages without error or misunderstanding. Standardized specifications go a long way towards creating this common ground, but differences in implementation may still lead to breakdowns in communication. Interoperability is when services can interact with each other without encountering such problems. Full Article