or Adaptive estimation in the supremum norm for semiparametric mixtures of regressions By projecteuclid.org Published On :: Thu, 23 Apr 2020 22:01 EDT Heiko Werner, Hajo Holzmann, Pierre Vandekerkhove. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1816--1871.Abstract: We investigate a flexible two-component semiparametric mixture of regressions model, in which one of the conditional component distributions of the response given the covariate is unknown but assumed symmetric about a location parameter, while the other is specified up to a scale parameter. The location and scale parameters together with the proportion are allowed to depend nonparametrically on covariates. After settling identifiability, we provide local M-estimators for these parameters which converge in the sup-norm at the optimal rates over Hölder-smoothness classes. We also introduce an adaptive version of the estimators based on the Lepski-method. Sup-norm bounds show that the local M-estimator properly estimates the functions globally, and are the first step in the construction of useful inferential tools such as confidence bands. In our analysis we develop general results about rates of convergence in the sup-norm as well as adaptive estimation of local M-estimators which might be of some independent interest, and which can also be applied in various other settings. We investigate the finite-sample behaviour of our method in a simulation study, and give an illustration to a real data set from bioinformatics. Full Article
or Nonparametric false discovery rate control for identifying simultaneous signals By projecteuclid.org Published On :: Thu, 23 Apr 2020 22:01 EDT Sihai Dave Zhao, Yet Tien Nguyen. Source: Electronic Journal of Statistics, Volume 14, Number 1, 110--142.Abstract: It is frequently of interest to identify simultaneous signals, defined as features that exhibit statistical significance across each of several independent experiments. For example, genes that are consistently differentially expressed across experiments in different animal species can reveal evolutionarily conserved biological mechanisms. However, in some problems the test statistics corresponding to these features can have complicated or unknown null distributions. This paper proposes a novel nonparametric false discovery rate control procedure that can identify simultaneous signals even without knowing these null distributions. The method is shown, theoretically and in simulations, to asymptotically control the false discovery rate. It was also used to identify genes that were both differentially expressed and proximal to differentially accessible chromatin in the brains of mice exposed to a conspecific intruder. The proposed method is available in the R package github.com/sdzhao/ssa. Full Article
or Bias correction in conditional multivariate extremes By projecteuclid.org Published On :: Wed, 22 Apr 2020 04:02 EDT Mikael Escobar-Bach, Yuri Goegebeur, Armelle Guillou. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1773--1795.Abstract: We consider bias-corrected estimation of the stable tail dependence function in the regression context. To this aim, we first estimate the bias of a smoothed estimator of the stable tail dependence function, and then we subtract it from the estimator. The weak convergence, as a stochastic process, of the resulting asymptotically unbiased estimator of the conditional stable tail dependence function, correctly normalized, is established under mild assumptions, the covariate argument being fixed. The finite sample behaviour of our asymptotically unbiased estimator is then illustrated on a simulation study and compared to two alternatives, which are not bias corrected. Finally, our methodology is applied to a dataset of air pollution measurements. Full Article
or Non-parametric adaptive estimation of order 1 Sobol indices in stochastic models, with an application to Epidemiology By projecteuclid.org Published On :: Wed, 22 Apr 2020 04:02 EDT Gwenaëlle Castellan, Anthony Cousien, Viet Chi Tran. Source: Electronic Journal of Statistics, Volume 14, Number 1, 50--81.Abstract: Global sensitivity analysis is a set of methods aiming at quantifying the contribution of an uncertain input parameter of the model (or combination of parameters) on the variability of the response. We consider here the estimation of the Sobol indices of order 1 which are commonly-used indicators based on a decomposition of the output’s variance. In a deterministic framework, when the same inputs always give the same outputs, these indices are usually estimated by replicated simulations of the model. In a stochastic framework, when the response given a set of input parameters is not unique due to randomness in the model, metamodels are often used to approximate the mean and dispersion of the response by deterministic functions. We propose a new non-parametric estimator without the need of defining a metamodel to estimate the Sobol indices of order 1. The estimator is based on warped wavelets and is adaptive in the regularity of the model. The convergence of the mean square error to zero, when the number of simulations of the model tend to infinity, is computed and an elbow effect is shown, depending on the regularity of the model. Applications in Epidemiology are carried to illustrate the use of non-parametric estimators. Full Article
or Simultaneous transformation and rounding (STAR) models for integer-valued data By projecteuclid.org Published On :: Wed, 15 Apr 2020 04:02 EDT Daniel R. Kowal, Antonio Canale. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1744--1772.Abstract: We propose a simple yet powerful framework for modeling integer-valued data, such as counts, scores, and rounded data. The data-generating process is defined by Simultaneously Transforming and Rounding (STAR) a continuous-valued process, which produces a flexible family of integer-valued distributions capable of modeling zero-inflation, bounded or censored data, and over- or underdispersion. The transformation is modeled as unknown for greater distributional flexibility, while the rounding operation ensures a coherent integer-valued data-generating process. An efficient MCMC algorithm is developed for posterior inference and provides a mechanism for adaptation of successful Bayesian models and algorithms for continuous data to the integer-valued data setting. Using the STAR framework, we design a new Bayesian Additive Regression Tree model for integer-valued data, which demonstrates impressive predictive distribution accuracy for both synthetic data and a large healthcare utilization dataset. For interpretable regression-based inference, we develop a STAR additive model, which offers greater flexibility and scalability than existing integer-valued models. The STAR additive model is applied to study the recent decline in Amazon river dolphins. Full Article
or Posterior contraction and credible sets for filaments of regression functions By projecteuclid.org Published On :: Tue, 14 Apr 2020 22:01 EDT Wei Li, Subhashis Ghosal. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1707--1743.Abstract: A filament consists of local maximizers of a smooth function $f$ when moving in a certain direction. A filamentary structure is an important feature of the shape of an object and is also considered as an important lower dimensional characterization of multivariate data. There have been some recent theoretical studies of filaments in the nonparametric kernel density estimation context. This paper supplements the current literature in two ways. First, we provide a Bayesian approach to the filament estimation in regression context and study the posterior contraction rates using a finite random series of B-splines basis. Compared with the kernel-estimation method, this has a theoretical advantage as the bias can be better controlled when the function is smoother, which allows obtaining better rates. Assuming that $f:mathbb{R}^{2}mapsto mathbb{R}$ belongs to an isotropic Hölder class of order $alpha geq 4$, with the optimal choice of smoothing parameters, the posterior contraction rates for the filament points on some appropriately defined integral curves and for the Hausdorff distance of the filament are both $(n/log n)^{(2-alpha )/(2(1+alpha ))}$. Secondly, we provide a way to construct a credible set with sufficient frequentist coverage for the filaments. We demonstrate the success of our proposed method in simulations and one application to earthquake data. Full Article
or A fast MCMC algorithm for the uniform sampling of binary matrices with fixed margins By projecteuclid.org Published On :: Thu, 09 Apr 2020 04:00 EDT Guanyang Wang. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1690--1706.Abstract: Uniform sampling of binary matrix with fixed margins is an important and difficult problem in statistics, computer science, ecology and so on. The well-known swap algorithm would be inefficient when the size of the matrix becomes large or when the matrix is too sparse/dense. Here we propose the Rectangle Loop algorithm, a Markov chain Monte Carlo algorithm to sample binary matrices with fixed margins uniformly. Theoretically the Rectangle Loop algorithm is better than the swap algorithm in Peskun’s order. Empirically studies also demonstrates the Rectangle Loop algorithm is remarkablely more efficient than the swap algorithm. Full Article
or Beta-Binomial stick-breaking non-parametric prior By projecteuclid.org Published On :: Wed, 08 Apr 2020 22:01 EDT María F. Gil–Leyva, Ramsés H. Mena, Theodoros Nicoleris. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1479--1507.Abstract: A new class of nonparametric prior distributions, termed Beta-Binomial stick-breaking process, is proposed. By allowing the underlying length random variables to be dependent through a Beta marginals Markov chain, an appealing discrete random probability measure arises. The chain’s dependence parameter controls the ordering of the stick-breaking weights, and thus tunes the model’s label-switching ability. Also, by tuning this parameter, the resulting class contains the Dirichlet process and the Geometric process priors as particular cases, which is of interest for MCMC implementations. Some properties of the model are discussed and a density estimation algorithm is proposed and tested with simulated datasets. Full Article
or Nonconcave penalized estimation in sparse vector autoregression model By projecteuclid.org Published On :: Wed, 01 Apr 2020 04:00 EDT Xuening Zhu. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1413--1448.Abstract: High dimensional time series receive considerable attention recently, whose temporal and cross-sectional dependency could be captured by the vector autoregression (VAR) model. To tackle with the high dimensionality, penalization methods are widely employed. However, theoretically, the existing studies of the penalization methods mainly focus on $i.i.d$ data, therefore cannot quantify the effect of the dependence level on the convergence rate. In this work, we use the spectral properties of the time series to quantify the dependence and derive a nonasymptotic upper bound for the estimation errors. By focusing on the nonconcave penalization methods, we manage to establish the oracle properties of the penalized VAR model estimation by considering the effects of temporal and cross-sectional dependence. Extensive numerical studies are conducted to compare the finite sample performance using different penalization functions. Lastly, an air pollution data of mainland China is analyzed for illustration purpose. Full Article
or A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables By projecteuclid.org Published On :: Fri, 27 Mar 2020 22:00 EDT Ryoya Oda, Hirokazu Yanagihara. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1386--1412.Abstract: We put forward a variable selection method for selecting explanatory variables in a normality-assumed multivariate linear regression. It is cumbersome to calculate variable selection criteria for all subsets of explanatory variables when the number of explanatory variables is large. Therefore, we propose a fast and consistent variable selection method based on a generalized $C_{p}$ criterion. The consistency of the method is provided by a high-dimensional asymptotic framework such that the sample size and the sum of the dimensions of response vectors and explanatory vectors divided by the sample size tend to infinity and some positive constant which are less than one, respectively. Through numerical simulations, it is shown that the proposed method has a high probability of selecting the true subset of explanatory variables and is fast under a moderate sample size even when the number of dimensions is large. Full Article
or Computing the degrees of freedom of rank-regularized estimators and cousins By projecteuclid.org Published On :: Thu, 26 Mar 2020 22:03 EDT Rahul Mazumder, Haolei Weng. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1348--1385.Abstract: Estimating a low rank matrix from its linear measurements is a problem of central importance in contemporary statistical analysis. The choice of tuning parameters for estimators remains an important challenge from a theoretical and practical perspective. To this end, Stein’s Unbiased Risk Estimate (SURE) framework provides a well-grounded statistical framework for degrees of freedom estimation. In this paper, we use the SURE framework to obtain degrees of freedom estimates for a general class of spectral regularized matrix estimators—our results generalize beyond the class of estimators that have been studied thus far. To this end, we use a result due to Shapiro (2002) pertaining to the differentiability of symmetric matrix valued functions, developed in the context of semidefinite optimization algorithms. We rigorously verify the applicability of Stein’s Lemma towards the derivation of degrees of freedom estimates; and also present new techniques based on Gaussian convolution to estimate the degrees of freedom of a class of spectral estimators, for which Stein’s Lemma does not directly apply. Full Article
or Differential network inference via the fused D-trace loss with cross variables By projecteuclid.org Published On :: Tue, 24 Mar 2020 22:01 EDT Yichong Wu, Tiejun Li, Xiaoping Liu, Luonan Chen. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1269--1301.Abstract: Detecting the change of biological interaction networks is of great importance in biological and medical research. We proposed a simple loss function, named as CrossFDTL, to identify the network change or differential network by estimating the difference between two precision matrices under Gaussian assumption. The CrossFDTL is a natural fusion of the D-trace loss for the considered two networks by imposing the $ell _{1}$ penalty to the differential matrix to ensure sparsity. The key point of our method is to utilize the cross variables, which correspond to the sum and difference of two precision matrices instead of using their original forms. Moreover, we developed an efficient minimization algorithm for the proposed loss function and further rigorously proved its convergence. Numerical results showed that our method outperforms the existing methods in both accuracy and convergence speed for the simulated and real data. Full Article
or Consistency and asymptotic normality of Latent Block Model estimators By projecteuclid.org Published On :: Mon, 23 Mar 2020 22:02 EDT Vincent Brault, Christine Keribin, Mahendra Mariadassou. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1234--1268.Abstract: The Latent Block Model (LBM) is a model-based method to cluster simultaneously the $d$ columns and $n$ rows of a data matrix. Parameter estimation in LBM is a difficult and multifaceted problem. Although various estimation strategies have been proposed and are now well understood empirically, theoretical guarantees about their asymptotic behavior is rather sparse and most results are limited to the binary setting. We prove here theoretical guarantees in the valued settings. We show that under some mild conditions on the parameter space, and in an asymptotic regime where $log (d)/n$ and $log (n)/d$ tend to $0$ when $n$ and $d$ tend to infinity, (1) the maximum-likelihood estimate of the complete model (with known labels) is consistent and (2) the log-likelihood ratios are equivalent under the complete and observed (with unknown labels) models. This equivalence allows us to transfer the asymptotic consistency, and under mild conditions, asymptotic normality, to the maximum likelihood estimate under the observed model. Moreover, the variational estimator is also consistent and, under the same conditions, asymptotically normal. Full Article
or A general drift estimation procedure for stochastic differential equations with additive fractional noise By projecteuclid.org Published On :: Tue, 25 Feb 2020 22:00 EST Fabien Panloup, Samy Tindel, Maylis Varvenne. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1075--1136.Abstract: In this paper we consider the drift estimation problem for a general differential equation driven by an additive multidimensional fractional Brownian motion, under ergodic assumptions on the drift coefficient. Our estimation procedure is based on the identification of the invariant measure, and we provide consistency results as well as some information about the convergence rate. We also give some examples of coefficients for which the identifiability assumption for the invariant measure is satisfied. Full Article
or Testing goodness of fit for point processes via topological data analysis By projecteuclid.org Published On :: Mon, 24 Feb 2020 04:00 EST Christophe A. N. Biscio, Nicolas Chenavier, Christian Hirsch, Anne Marie Svane. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1024--1074.Abstract: We introduce tests for the goodness of fit of point patterns via methods from topological data analysis. More precisely, the persistent Betti numbers give rise to a bivariate functional summary statistic for observed point patterns that is asymptotically Gaussian in large observation windows. We analyze the power of tests derived from this statistic on simulated point patterns and compare its performance with global envelope tests. Finally, we apply the tests to a point pattern from an application context in neuroscience. As the main methodological contribution, we derive sufficient conditions for a functional central limit theorem on bounded persistent Betti numbers of point processes with exponential decay of correlations. Full Article
or Conditional density estimation with covariate measurement error By projecteuclid.org Published On :: Wed, 19 Feb 2020 22:06 EST Xianzheng Huang, Haiming Zhou. Source: Electronic Journal of Statistics, Volume 14, Number 1, 970--1023.Abstract: We consider estimating the density of a response conditioning on an error-prone covariate. Motivated by two existing kernel density estimators in the absence of covariate measurement error, we propose a method to correct the existing estimators for measurement error. Asymptotic properties of the resultant estimators under different types of measurement error distributions are derived. Moreover, we adjust bandwidths readily available from existing bandwidth selection methods developed for error-free data to obtain bandwidths for the new estimators. Extensive simulation studies are carried out to compare the proposed estimators with naive estimators that ignore measurement error, which also provide empirical evidence for the effectiveness of the proposed bandwidth selection methods. A real-life data example is used to illustrate implementation of these methods under practical scenarios. An R package, lpme, is developed for implementing all considered methods, which we demonstrate via an R code example in Appendix B.2. Full Article
or On the distribution, model selection properties and uniqueness of the Lasso estimator in low and high dimensions By projecteuclid.org Published On :: Mon, 17 Feb 2020 22:06 EST Karl Ewald, Ulrike Schneider. Source: Electronic Journal of Statistics, Volume 14, Number 1, 944--969.Abstract: We derive expressions for the finite-sample distribution of the Lasso estimator in the context of a linear regression model in low as well as in high dimensions by exploiting the structure of the optimization problem defining the estimator. In low dimensions, we assume full rank of the regressor matrix and present expressions for the cumulative distribution function as well as the densities of the absolutely continuous parts of the estimator. Our results are presented for the case of normally distributed errors, but do not hinge on this assumption and can easily be generalized. Additionally, we establish an explicit formula for the correspondence between the Lasso and the least-squares estimator. We derive analogous results for the distribution in less explicit form in high dimensions where we make no assumptions on the regressor matrix at all. In this setting, we also investigate the model selection properties of the Lasso and show that possibly only a subset of models might be selected by the estimator, completely independently of the observed response vector. Finally, we present a condition for uniqueness of the estimator that is necessary as well as sufficient. Full Article
or Generalized bounds for active subspaces By projecteuclid.org Published On :: Mon, 17 Feb 2020 22:06 EST Mario Teixeira Parente, Jonas Wallin, Barbara Wohlmuth. Source: Electronic Journal of Statistics, Volume 14, Number 1, 917--943.Abstract: In this article, we consider scenarios in which traditional estimates for the active subspace method based on probabilistic Poincaré inequalities are not valid due to unbounded Poincaré constants. Consequently, we propose a framework that allows to derive generalized estimates in the sense that it enables to control the trade-off between the size of the Poincaré constant and a weaker order of the final error bound. In particular, we investigate independently exponentially distributed random variables in dimension two or larger and give explicit expressions for corresponding Poincaré constants showing their dependence on the dimension of the problem. Finally, we suggest possibilities for future work that aim for extending the class of distributions applicable to the active subspace method as we regard this as an opportunity to enlarge its usability. Full Article
or Reduction problems and deformation approaches to nonstationary covariance functions over spheres By projecteuclid.org Published On :: Tue, 11 Feb 2020 22:03 EST Emilio Porcu, Rachid Senoussi, Enner Mendoza, Moreno Bevilacqua. Source: Electronic Journal of Statistics, Volume 14, Number 1, 890--916.Abstract: The paper considers reduction problems and deformation approaches for nonstationary covariance functions on the $(d-1)$-dimensional spheres, $mathbb{S}^{d-1}$, embedded in the $d$-dimensional Euclidean space. Given a covariance function $C$ on $mathbb{S}^{d-1}$, we chase a pair $(R,Psi)$, for a function $R:[-1,+1] o mathbb{R}$ and a smooth bijection $Psi$, such that $C$ can be reduced to a geodesically isotropic one: $C(mathbf{x},mathbf{y})=R(langle Psi (mathbf{x}),Psi (mathbf{y}) angle )$, with $langle cdot ,cdot angle $ denoting the dot product. The problem finds motivation in recent statistical literature devoted to the analysis of global phenomena, defined typically over the sphere of $mathbb{R}^{3}$. The application domains considered in the manuscript makes the problem mathematically challenging. We show the uniqueness of the representation in the reduction problem. Then, under some regularity assumptions, we provide an inversion formula to recover the bijection $Psi$, when it exists, for a given $C$. We also give sufficient conditions for reducibility. Full Article
or On a Metropolis–Hastings importance sampling estimator By projecteuclid.org Published On :: Mon, 10 Feb 2020 04:01 EST Daniel Rudolf, Björn Sprungk. Source: Electronic Journal of Statistics, Volume 14, Number 1, 857--889.Abstract: A classical approach for approximating expectations of functions w.r.t. partially known distributions is to compute the average of function values along a trajectory of a Metropolis–Hastings (MH) Markov chain. A key part in the MH algorithm is a suitable acceptance/rejection of a proposed state, which ensures the correct stationary distribution of the resulting Markov chain. However, the rejection of proposals causes highly correlated samples. In particular, when a state is rejected it is not taken any further into account. In contrast to that we consider a MH importance sampling estimator which explicitly incorporates all proposed states generated by the MH algorithm. The estimator satisfies a strong law of large numbers as well as a central limit theorem, and, in addition to that, we provide an explicit mean squared error bound. Remarkably, the asymptotic variance of the MH importance sampling estimator does not involve any correlation term in contrast to its classical counterpart. Moreover, although the analyzed estimator uses the same amount of information as the classical MH estimator, it can outperform the latter in scenarios of moderate dimensions as indicated by numerical experiments. Full Article
or Estimation of a semiparametric transformation model: A novel approach based on least squares minimization By projecteuclid.org Published On :: Tue, 04 Feb 2020 22:03 EST Benjamin Colling, Ingrid Van Keilegom. Source: Electronic Journal of Statistics, Volume 14, Number 1, 769--800.Abstract: Consider the following semiparametric transformation model $Lambda_{ heta }(Y)=m(X)+varepsilon $, where $X$ is a $d$-dimensional covariate, $Y$ is a univariate response variable and $varepsilon $ is an error term with zero mean and independent of $X$. We assume that $m$ is an unknown regression function and that ${Lambda _{ heta }: heta inTheta }$ is a parametric family of strictly increasing functions. Our goal is to develop two new estimators of the transformation parameter $ heta $. The main idea of these two estimators is to minimize, with respect to $ heta $, the $L_{2}$-distance between the transformation $Lambda _{ heta }$ and one of its fully nonparametric estimators. We consider in particular the nonparametric estimator based on the least-absolute deviation loss constructed in Colling and Van Keilegom (2019). We establish the consistency and the asymptotic normality of the two proposed estimators of $ heta $. We also carry out a simulation study to illustrate and compare the performance of our new parametric estimators to that of the profile likelihood estimator constructed in Linton et al. (2008). Full Article
or The bias and skewness of M -estimators in regression By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Christopher Withers, Saralees NadarajahSource: Electron. J. Statist., Volume 4, 1--14.Abstract: We consider M estimation of a regression model with a nuisance parameter and a vector of other parameters. The unknown distribution of the residuals is not assumed to be normal or symmetric. Simple and easily estimated formulas are given for the dominant terms of the bias and skewness of the parameter estimates. For the linear model these are proportional to the skewness of the ‘independent’ variables. For a nonlinear model, its linear component plays the role of these independent variables, and a second term must be added proportional to the covariance of its linear and quadratic components. For the least squares estimate with normal errors this term was derived by Box [1]. We also consider the effect of a large number of parameters, and the case of random independent variables. Full Article
or A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints By Published On :: 2020 This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satisfied in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve $O(sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret and constraint violations when the constraint set can be described by a finite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve $O(T^{max{ heta,1- heta}})$ regret and $O(T^{1- heta/2})$ constraint violations where $ hetain (0,1)$. The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an $O(sqrt{T})$ regret bound with $O(1)$ constraint violations. Full Article
or Universal Latent Space Model Fitting for Large Networks with Edge Covariates By Published On :: 2020 Latent space models are effective tools for statistical modeling and visualization of network data. Due to their close connection to generalized linear models, it is also natural to incorporate covariate information in them. The current paper presents two universal fitting algorithms for networks with edge covariates: one based on nuclear norm penalization and the other based on projected gradient descent. Both algorithms are motivated by maximizing the likelihood function for an existing class of inner-product models, and we establish their statistical rates of convergence for these models. In addition, the theory informs us that both methods work simultaneously for a wide range of different latent space models that allow latent positions to affect edge formation in flexible ways, such as distance models. Furthermore, the effectiveness of the methods is demonstrated on a number of real world network data sets for different statistical tasks, including community detection with and without edge covariates, and network assisted learning. Full Article
or Lower Bounds for Parallel and Randomized Convex Optimization By Published On :: 2020 We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show that it is not possible to obtain a polylogarithmic (in the sequential complexity of the problem) number of parallel rounds with a polynomial (in the dimension) number of queries per round. In the majority of these settings and when the dimension of the space is polynomial in the inverse target accuracy, our lower bounds match the oracle complexity of sequential convex optimization, up to at most a logarithmic factor in the dimension, which makes them (nearly) tight. Another conceptual contribution of our work is in providing a general and streamlined framework for proving lower bounds in the setting of parallel convex optimization. Prior to our work, lower bounds for parallel convex optimization algorithms were only known in a small fraction of the settings considered in this paper, mainly applying to Euclidean ($ell_2$) and $ell_infty$ spaces. Full Article
or Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms By Published On :: 2020 We consider the problem of clustering with the longest-leg path distance (LLPD) metric, which is informative for elongated and irregularly shaped clusters. We prove finite-sample guarantees on the performance of clustering with respect to this metric when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of high-dimensional outliers. By combining these results with spectral clustering with respect to LLPD, we provide conditions under which the Laplacian eigengap statistic correctly determines the number of clusters for a large class of data sets, and prove guarantees on the labeling accuracy of the proposed algorithm. Our methods are quite general and provide performance guarantees for spectral clustering with any ultrametric. We also introduce an efficient, easy to implement approximation algorithm for the LLPD based on a multiscale analysis of adjacency graphs, which allows for the runtime of LLPD spectral clustering to be quasilinear in the number of data points. Full Article
or Target Propagation in Recurrent Neural Networks By Published On :: 2020 Recurrent Neural Networks have been widely used to process sequence data, but have long been criticized for their biological implausibility and training difficulties related to vanishing and exploding gradients. This paper presents a novel algorithm for training recurrent networks, target propagation through time (TPTT), that outperforms standard backpropagation through time (BPTT) on four out of the five problems used for testing. The proposed algorithm is initially tested and compared to BPTT on four synthetic time lag tasks, and its performance is also measured using the sequential MNIST data set. In addition, as TPTT uses target propagation, it allows for discrete nonlinearities and could potentially mitigate the credit assignment problem in more complex recurrent architectures. Full Article
or Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information By Published On :: 2020 We study the misclassification error for community detection in general heterogeneous stochastic block models (SBM) with noisy or partial label information. We establish a connection between the misclassification rate and the notion of minimum energy on the local neighborhood of the SBM. We develop an optimally weighted message passing algorithm to reconstruct labels for SBM based on the minimum energy flow and the eigenvectors of a certain Markov transition matrix. The general SBM considered in this paper allows for unequal-size communities, degree heterogeneity, and different connection probabilities among blocks. We focus on how to optimally weigh the message passing to improve misclassification. Full Article
or Generalized probabilistic principal component analysis of correlated data By Published On :: 2020 Principal component analysis (PCA) is a well-established tool in machine learning and data processing. The principal axes in PCA were shown to be equivalent to the maximum marginal likelihood estimator of the factor loading matrix in a latent factor model for the observed data, assuming that the latent factors are independently distributed as standard normal distributions. However, the independence assumption may be unrealistic for many scenarios such as modeling multiple time series, spatial processes, and functional data, where the outcomes are correlated. In this paper, we introduce the generalized probabilistic principal component analysis (GPPCA) to study the latent factor model for multiple correlated outcomes, where each factor is modeled by a Gaussian process. Our method generalizes the previous probabilistic formulation of PCA (PPCA) by providing the closed-form maximum marginal likelihood estimator of the factor loadings and other parameters. Based on the explicit expression of the precision matrix in the marginal likelihood that we derived, the number of the computational operations is linear to the number of output variables. Furthermore, we also provide the closed-form expression of the marginal likelihood when other covariates are included in the mean structure. We highlight the advantage of GPPCA in terms of the practical relevance, estimation accuracy and computational convenience. Numerical studies of simulated and real data confirm the excellent finite-sample performance of the proposed approach. Full Article
or On lp-Support Vector Machines and Multidimensional Kernels By Published On :: 2020 In this paper, we extend the methodology developed for Support Vector Machines (SVM) using the $ell_2$-norm ($ell_2$-SVM) to the more general case of $ell_p$-norms with $p>1$ ($ell_p$-SVM). We derive second order cone formulations for the resulting dual and primal problems. The concept of kernel function, widely applied in $ell_2$-SVM, is extended to the more general case of $ell_p$-norms with $p>1$ by defining a new operator called multidimensional kernel. This object gives rise to reformulations of dual problems, in a transformed space of the original data, where the dependence on the original data always appear as homogeneous polynomials. We adapt known solution algorithms to efficiently solve the primal and dual resulting problems and some computational experiments on real-world datasets are presented showing rather good behavior in terms of the accuracy of $ell_p$-SVM with $p>1$. Full Article
or Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning By Published On :: 2020 One of the common tasks in unsupervised learning is dimensionality reduction, where the goal is to find meaningful low-dimensional structures hidden in high-dimensional data. Sometimes referred to as manifold learning, this problem is closely related to the problem of localization, which aims at embedding a weighted graph into a low-dimensional Euclidean space. Several methods have been proposed for localization, and also manifold learning. Nonetheless, the robustness property of most of them is little understood. In this paper, we obtain perturbation bounds for classical scaling and trilateration, which are then applied to derive performance bounds for Isomap, Landmark Isomap, and Maximum Variance Unfolding. A new perturbation bound for procrustes analysis plays a key role. Full Article
or Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data By Published On :: 2020 A common divide-and-conquer approach for Bayesian computation with big data is to partition the data, perform local inference for each piece separately, and combine the results to obtain a global posterior approximation. While being conceptually and computationally appealing, this method involves the problematic need to also split the prior for the local inferences; these weakened priors may not provide enough regularization for each separate computation, thus eliminating one of the key advantages of Bayesian methods. To resolve this dilemma while still retaining the generalizability of the underlying local inference method, we apply the idea of expectation propagation (EP) as a framework for distributed Bayesian inference. The central idea is to iteratively update approximations to the local likelihoods given the state of the other approximations and the prior. The present paper has two roles: we review the steps that are needed to keep EP algorithms numerically stable, and we suggest a general approach, inspired by EP, for approaching data partitioning problems in a way that achieves the computational benefits of parallelism while allowing each local update to make use of relevant information from the other sites. In addition, we demonstrate how the method can be applied in a hierarchical context to make use of partitioning of both data and parameters. The paper describes a general algorithmic framework, rather than a specific algorithm, and presents an example implementation for it. Full Article
or Convergences of Regularized Algorithms and Stochastic Gradient Methods with Random Projections By Published On :: 2020 We study the least-squares regression problem over a Hilbert space, covering nonparametric regression over a reproducing kernel Hilbert space as a special case. We first investigate regularized algorithms adapted to a projection operator on a closed subspace of the Hilbert space. We prove convergence results with respect to variants of norms, under a capacity assumption on the hypothesis space and a regularity condition on the target function. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. As a byproduct, we obtain similar results for Nystr"{o}m regularized algorithms. Our results provide optimal, distribution-dependent rates that do not have any saturation effect for sketched/Nystr"{o}m regularized algorithms, considering both the attainable and non-attainable cases, in the well-conditioned regimes. We then study stochastic gradient methods with projection over the subspace, allowing multi-pass over the data and minibatches, and we derive similar optimal statistical convergence results. Full Article
or Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems By Published On :: 2020 We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. Our main theoretical result provides an explicit bound on the sample or evaluation complexity: we show that these methods are guaranteed to converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by simulations of derivative-free methods in application to these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems. Full Article
or A Unified Framework for Structured Graph Learning via Spectral Constraints By Published On :: 2020 Graph learning from data is a canonical problem that has received substantial attention in the literature. Learning a structured graph is essential for interpretability and identification of the relationships among data. In general, learning a graph with a specific structure is an NP-hard combinatorial problem and thus designing a general tractable algorithm is challenging. Some useful structured graphs include connected, sparse, multi-component, bipartite, and regular graphs. In this paper, we introduce a unified framework for structured graph learning that combines Gaussian graphical model and spectral graph theory. We propose to convert combinatorial structural constraints into spectral constraints on graph matrices and develop an optimization framework based on block majorization-minimization to solve structured graph learning problem. The proposed algorithms are provably convergent and practically amenable for a number of graph based applications such as data clustering. Extensive numerical experiments with both synthetic and real data sets illustrate the effectiveness of the proposed algorithms. An open source R package containing the code for all the experiments is available at https://CRAN.R-project.org/package=spectralGraphTopology. Full Article
or Lower Bounds for Testing Graphical Models: Colorings and Antiferromagnetic Ising Models By Published On :: 2020 We study the identity testing problem in the context of spin systems or undirected graphical models, where it takes the following form: given the parameter specification of the model $M$ and a sampling oracle for the distribution $mu_{M^*}$ of an unknown model $M^*$, can we efficiently determine if the two models $M$ and $M^*$ are the same? We consider identity testing for both soft-constraint and hard-constraint systems. In particular, we prove hardness results in two prototypical cases, the Ising model and proper colorings, and explore whether identity testing is any easier than structure learning. For the ferromagnetic (attractive) Ising model, Daskalakis et al. (2018) presented a polynomial-time algorithm for identity testing. We prove hardness results in the antiferromagnetic (repulsive) setting in the same regime of parameters where structure learning is known to require a super-polynomial number of samples. Specifically, for $n$-vertex graphs of maximum degree $d$, we prove that if $|eta| d = omega(log{n})$ (where $eta$ is the inverse temperature parameter), then there is no polynomial running time identity testing algorithm unless $RP=NP$. In the hard-constraint setting, we present hardness results for identity testing for proper colorings. Our results are based on the presumed hardness of #BIS, the problem of (approximately) counting independent sets in bipartite graphs. Full Article
or A New Class of Time Dependent Latent Factor Models with Applications By Published On :: 2020 In many applications, observed data are influenced by some combination of latent causes. For example, suppose sensors are placed inside a building to record responses such as temperature, humidity, power consumption and noise levels. These random, observed responses are typically affected by many unobserved, latent factors (or features) within the building such as the number of individuals, the turning on and off of electrical devices, power surges, etc. These latent factors are usually present for a contiguous period of time before disappearing; further, multiple factors could be present at a time. This paper develops new probabilistic methodology and inference methods for random object generation influenced by latent features exhibiting temporal persistence. Every datum is associated with subsets of a potentially infinite number of hidden, persistent features that account for temporal dynamics in an observation. The ensuing class of dynamic models constructed by adapting the Indian Buffet Process — a probability measure on the space of random, unbounded binary matrices — finds use in a variety of applications arising in operations, signal processing, biomedicine, marketing, image analysis, etc. Illustrations using synthetic and real data are provided. Full Article
or On the consistency of graph-based Bayesian semi-supervised learning and the scalability of sampling algorithms By Published On :: 2020 This paper considers a Bayesian approach to graph-based semi-supervised learning. We show that if the graph parameters are suitably scaled, the graph-posteriors converge to a continuum limit as the size of the unlabeled data set grows. This consistency result has profound algorithmic implications: we prove that when consistency holds, carefully designed Markov chain Monte Carlo algorithms have a uniform spectral gap, independent of the number of unlabeled inputs. Numerical experiments illustrate and complement the theory. Full Article
or The Maximum Separation Subspace in Sufficient Dimension Reduction with Categorical Response By Published On :: 2020 Sufficient dimension reduction (SDR) is a very useful concept for exploratory analysis and data visualization in regression, especially when the number of covariates is large. Many SDR methods have been proposed for regression with a continuous response, where the central subspace (CS) is the target of estimation. Various conditions, such as the linearity condition and the constant covariance condition, are imposed so that these methods can estimate at least a portion of the CS. In this paper we study SDR for regression and discriminant analysis with categorical response. Motivated by the exploratory analysis and data visualization aspects of SDR, we propose a new geometric framework to reformulate the SDR problem in terms of manifold optimization and introduce a new concept called Maximum Separation Subspace (MASES). The MASES naturally preserves the “sufficiency” in SDR without imposing additional conditions on the predictor distribution, and directly inspires a semi-parametric estimator. Numerical studies show MASES exhibits superior performance as compared with competing SDR methods in specific settings. Full Article
or Tensor Train Decomposition on TensorFlow (T3F) By Published On :: 2020 Tensor Train decomposition is used across many branches of machine learning. We present T3F—a library for Tensor Train decomposition based on TensorFlow. T3F supports GPU execution, batch processing, automatic differentiation, and versatile functionality for the Riemannian optimization framework, which takes into account the underlying manifold structure to construct efficient optimization methods. The library makes it easier to implement machine learning papers that rely on the Tensor Train decomposition. T3F includes documentation, examples and 94% test coverage. Full Article
or On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent By Published On :: 2020 Dual first-order methods are essential techniques for large-scale constrained convex optimization. However, when recovering the primal solutions, we need $T(epsilon^{-2})$ iterations to achieve an $epsilon$-optimal primal solution when we apply an algorithm to the non-strongly convex dual problem with $T(epsilon^{-1})$ iterations to achieve an $epsilon$-optimal dual solution, where $T(x)$ can be $x$ or $sqrt{x}$. In this paper, we prove that the iteration complexity of the primal solutions and dual solutions have the same $Oleft(frac{1}{sqrt{epsilon}} ight)$ order of magnitude for the accelerated randomized dual coordinate ascent. When the dual function further satisfies the quadratic functional growth condition, by restarting the algorithm at any period, we establish the linear iteration complexity for both the primal solutions and dual solutions even if the condition number is unknown. When applied to the regularized empirical risk minimization problem, we prove the iteration complexity of $Oleft(nlog n+sqrt{frac{n}{epsilon}} ight)$ in both primal space and dual space, where $n$ is the number of samples. Our result takes out the $left(log frac{1}{epsilon} ight)$ factor compared with the methods based on smoothing/regularization or Catalyst reduction. As far as we know, this is the first time that the optimal $Oleft(sqrt{frac{n}{epsilon}} ight)$ iteration complexity in the primal space is established for the dual coordinate ascent based stochastic algorithms. We also establish the accelerated linear complexity for some problems with nonsmooth loss, e.g., the least absolute deviation and SVM. Full Article
or Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent By Published On :: 2020 We propose graph-dependent implicit regularisation strategies for synchronised distributed stochastic subgradient descent (Distributed SGD) for convex problems in multi-agent learning. Under the standard assumptions of convexity, Lipschitz continuity, and smoothness, we establish statistical learning rates that retain, up to logarithmic terms, single-machine serial statistical guarantees through implicit regularisation (step size tuning and early stopping) with appropriate dependence on the graph topology. Our approach avoids the need for explicit regularisation in decentralised learning problems, such as adding constraints to the empirical risk minimisation rule. Particularly for distributed methods, the use of implicit regularisation allows the algorithm to remain simple, without projections or dual methods. To prove our results, we establish graph-independent generalisation bounds for Distributed SGD that match the single-machine serial SGD setting (using algorithmic stability), and we establish graph-dependent optimisation bounds that are of independent interest. We present numerical experiments to show that the qualitative nature of the upper bounds we derive can be representative of real behaviours. Full Article
or Optimal Bipartite Network Clustering By Published On :: 2020 We study bipartite community detection in networks, or more generally the network biclustering problem. We present a fast two-stage procedure based on spectral initialization followed by the application of a pseudo-likelihood classifier twice. Under mild regularity conditions, we establish the weak consistency of the procedure (i.e., the convergence of the misclassification rate to zero) under a general bipartite stochastic block model. We show that the procedure is optimal in the sense that it achieves the optimal convergence rate that is achievable by a biclustering oracle, adaptively over the whole class, up to constants. This is further formalized by deriving a minimax lower bound over a class of biclustering problems. The optimal rate we obtain sharpens some of the existing results and generalizes others to a wide regime of average degree growth, from sparse networks with average degrees growing arbitrarily slowly to fairly dense networks with average degrees of order $sqrt{n}$. As a special case, we recover the known exact recovery threshold in the $log n$ regime of sparsity. To obtain the consistency result, as part of the provable version of the algorithm, we introduce a sub-block partitioning scheme that is also computationally attractive, allowing for distributed implementation of the algorithm without sacrificing optimality. The provable algorithm is derived from a general class of pseudo-likelihood biclustering algorithms that employ simple EM type updates. We show the effectiveness of this general class by numerical simulations. Full Article
or Branch and Bound for Piecewise Linear Neural Network Verification By Published On :: 2020 The success of Deep Learning and its potential use in many safety-critical applicationshas motivated research on formal verification of Neural Network (NN) models. In thiscontext, verification involves proving or disproving that an NN model satisfies certaininput-output properties. Despite the reputation of learned NN models as black boxes,and the theoretical hardness of proving useful properties about them, researchers havebeen successful in verifying some classes of models by exploiting their piecewise linearstructure and taking insights from formal methods such as Satisifiability Modulo Theory.However, these methods are still far from scaling to realistic neural networks. To facilitateprogress on this crucial area, we exploit the Mixed Integer Linear Programming (MIP) formulation of verification to propose a family of algorithms based on Branch-and-Bound (BaB). We show that our family contains previous verification methods as special cases.With the help of the BaB framework, we make three key contributions. Firstly, we identifynew methods that combine the strengths of multiple existing approaches, accomplishingsignificant performance improvements over previous state of the art. Secondly, we introducean effective branching strategy on ReLU non-linearities. This branching strategy allows usto efficiently and successfully deal with high input dimensional problems with convolutionalnetwork architecture, on which previous methods fail frequently. Finally, we proposecomprehensive test data sets and benchmarks which includes a collection of previouslyreleased testcases. We use the data sets to conduct a thorough experimental comparison ofexisting and new algorithms and to provide an inclusive analysis of the factors impactingthe hardness of verification problems. Full Article
or Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data By Published On :: 2020 We present a probabilistic framework for studying adversarial attacks on discrete data. Based on this framework, we derive a perturbation-based method, Greedy Attack, and a scalable learning-based method, Gumbel Attack, that illustrate various tradeoffs in the design of attacks. We demonstrate the effectiveness of these methods using both quantitative metrics and human evaluation on various state-of-the-art models for text classification, including a word-based CNN, a character-based CNN and an LSTM. As an example of our results, we show that the accuracy of character-based convolutional networks drops to the level of random selection by modifying only five characters through Greedy Attack. Full Article
or Dynamical Systems as Temporal Feature Spaces By Published On :: 2020 Parametrised state space models in the form of recurrent networks are often used in machine learning to learn from data streams exhibiting temporal dependencies. To break the black box nature of such models it is important to understand the dynamical features of the input-driving time series that are formed in the state space. We propose a framework for rigorous analysis of such state representations in vanishing memory state space models such as echo state networks (ESN). In particular, we consider the state space a temporal feature space and the readout mapping from the state space a kernel machine operating in that feature space. We show that: (1) The usual ESN strategy of randomly generating input-to-state, as well as state coupling leads to shallow memory time series representations, corresponding to cross-correlation operator with fast exponentially decaying coefficients; (2) Imposing symmetry on dynamic coupling yields a constrained dynamic kernel matching the input time series with straightforward exponentially decaying motifs or exponentially decaying motifs of the highest frequency; (3) Simple ring (cycle) high-dimensional reservoir topology specified only through two free parameters can implement deep memory dynamic kernels with a rich variety of matching motifs. We quantify richness of feature representations imposed by dynamic kernels and demonstrate that for dynamic kernel associated with cycle reservoir topology, the kernel richness undergoes a phase transition close to the edge of stability. Full Article
or pyts: A Python Package for Time Series Classification By Published On :: 2020 pyts is an open-source Python package for time series classification. This versatile toolbox provides implementations of many algorithms published in the literature, preprocessing functionalities, and data set loading utilities. pyts relies on the standard scientific Python packages numpy, scipy, scikit-learn, joblib, and numba, and is distributed under the BSD-3-Clause license. Documentation contains installation instructions, a detailed user guide, a full API description, and concrete self-contained examples. Full Article
or Ancestral Gumbel-Top-k Sampling for Sampling Without Replacement By Published On :: 2020 We develop ancestral Gumbel-Top-$k$ sampling: a generic and efficient method for sampling without replacement from discrete-valued Bayesian networks, which includes multivariate discrete distributions, Markov chains and sequence models. The method uses an extension of the Gumbel-Max trick to sample without replacement by finding the top $k$ of perturbed log-probabilities among all possible configurations of a Bayesian network. Despite the exponentially large domain, the algorithm has a complexity linear in the number of variables and sample size $k$. Our algorithm allows to set the number of parallel processors $m$, to trade off the number of iterations versus the total cost (iterations times $m$) of running the algorithm. For $m = 1$ the algorithm has minimum total cost, whereas for $m = k$ the number of iterations is minimized, and the resulting algorithm is known as Stochastic Beam Search. We provide extensions of the algorithm and discuss a number of related algorithms. We analyze the properties of ancestral Gumbel-Top-$k$ sampling and compare against alternatives on randomly generated Bayesian networks with different levels of connectivity. In the context of (deep) sequence models, we show its use as a method to generate diverse but high-quality translations and statistical estimates of translation quality and entropy. Full Article
or Skill Rating for Multiplayer Games. Introducing Hypernode Graphs and their Spectral Theory By Published On :: 2020 We consider the skill rating problem for multiplayer games, that is how to infer player skills from game outcomes in multiplayer games. We formulate the problem as a minimization problem $arg min_{s} s^T Delta s$ where $Delta$ is a positive semidefinite matrix and $s$ a real-valued function, of which some entries are the skill values to be inferred and other entries are constrained by the game outcomes. We leverage graph-based semi-supervised learning (SSL) algorithms for this problem. We apply our algorithms on several data sets of multiplayer games and obtain very promising results compared to Elo Duelling (see Elo, 1978) and TrueSkill (see Herbrich et al., 2006).. As we leverage graph-based SSL algorithms and because games can be seen as relations between sets of players, we then generalize the approach. For this aim, we introduce a new finite model, called hypernode graph, defined to be a set of weighted binary relations between sets of nodes. We define Laplacians of hypernode graphs. Then, we show that the skill rating problem for multiplayer games can be formulated as $arg min_{s} s^T Delta s$ where $Delta$ is the Laplacian of a hypernode graph constructed from a set of games. From a fundamental perspective, we show that hypernode graph Laplacians are symmetric positive semidefinite matrices with constant functions in their null space. We show that problems on hypernode graphs can not be solved with graph constructions and graph kernels. We relate hypernode graphs to signed graphs showing that positive relations between groups can lead to negative relations between individuals. Full Article
or Ensemble Learning for Relational Data By Published On :: 2020 We present a theoretical analysis framework for relational ensemble models. We show that ensembles of collective classifiers can improve predictions for graph data by reducing errors due to variance in both learning and inference. In addition, we propose a relational ensemble framework that combines a relational ensemble learning approach with a relational ensemble inference approach for collective classification. The proposed ensemble techniques are applicable for both single and multiple graph settings. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed framework. Finally, our experimental results support the theoretical analysis and confirm that ensemble algorithms that explicitly focus on both learning and inference processes and aim at reducing errors associated with both, are the best performers. Full Article