an Calibration Procedures for Approximate Bayesian Credible Sets By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Jeong Eun Lee, Geoff K. Nicholls, Robin J. Ryder. Source: Bayesian Analysis, Volume 14, Number 4, 1245--1269.Abstract: We develop and apply two calibration procedures for checking the coverage of approximate Bayesian credible sets, including intervals estimated using Monte Carlo methods. The user has an ideal prior and likelihood, but generates a credible set for an approximate posterior based on some approximate prior and likelihood. We estimate the realised posterior coverage achieved by the approximate credible set. This is the coverage of the unknown “true” parameter if the data are a realisation of the user’s ideal observation model conditioned on the parameter, and the parameter is a draw from the user’s ideal prior. In one approach we estimate the posterior coverage at the data by making a semi-parametric logistic regression of binary coverage outcomes on simulated data against summary statistics evaluated on simulated data. In another we use Importance Sampling from the approximate posterior, windowing simulated data to fall close to the observed data. We illustrate our methods on four examples. Full Article
an Estimating the Use of Public Lands: Integrated Modeling of Open Populations with Convolution Likelihood Ecological Abundance Regression By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Lutz F. Gruber, Erica F. Stuber, Lyndsie S. Wszola, Joseph J. Fontaine. Source: Bayesian Analysis, Volume 14, Number 4, 1173--1199.Abstract: We present an integrated open population model where the population dynamics are defined by a differential equation, and the related statistical model utilizes a Poisson binomial convolution likelihood. Key advantages of the proposed approach over existing open population models include the flexibility to predict related, but unobserved quantities such as total immigration or emigration over a specified time period, and more computationally efficient posterior simulation by elimination of the need to explicitly simulate latent immigration and emigration. The viability of the proposed method is shown in an in-depth analysis of outdoor recreation participation on public lands, where the surveyed populations changed rapidly and demographic population closure cannot be assumed even within a single day. Full Article
an Implicit Copulas from Bayesian Regularized Regression Smoothers By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Nadja Klein, Michael Stanley Smith. Source: Bayesian Analysis, Volume 14, Number 4, 1143--1171.Abstract: We show how to extract the implicit copula of a response vector from a Bayesian regularized regression smoother with Gaussian disturbances. The copula can be used to compare smoothers that employ different shrinkage priors and function bases. We illustrate with three popular choices of shrinkage priors—a pairwise prior, the horseshoe prior and a g prior augmented with a point mass as employed for Bayesian variable selection—and both univariate and multivariate function bases. The implicit copulas are high-dimensional, have flexible dependence structures that are far from that of a Gaussian copula, and are unavailable in closed form. However, we show how they can be evaluated by first constructing a Gaussian copula conditional on the regularization parameters, and then integrating over these. Combined with non-parametric margins the regularized smoothers can be used to model the distribution of non-Gaussian univariate responses conditional on the covariates. Efficient Markov chain Monte Carlo schemes for evaluating the copula are given for this case. Using both simulated and real data, we show how such copula smoothing models can improve the quality of resulting function estimates and predictive distributions. Full Article
an Bayesian Functional Forecasting with Locally-Autoregressive Dependent Processes By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Guillaume Kon Kam King, Antonio Canale, Matteo Ruggiero. Source: Bayesian Analysis, Volume 14, Number 4, 1121--1141.Abstract: Motivated by the problem of forecasting demand and offer curves, we introduce a class of nonparametric dynamic models with locally-autoregressive behaviour, and provide a full inferential strategy for forecasting time series of piecewise-constant non-decreasing functions over arbitrary time horizons. The model is induced by a non Markovian system of interacting particles whose evolution is governed by a resampling step and a drift mechanism. The former is based on a global interaction and accounts for the volatility of the functional time series, while the latter is determined by a neighbourhood-based interaction with the past curves and accounts for local trend behaviours, separating these from pure noise. We discuss the implementation of the model for functional forecasting by combining a population Monte Carlo and a semi-automatic learning approach to approximate Bayesian computation which require limited tuning. We validate the inference method with a simulation study, and carry out predictive inference on a real dataset on the Italian natural gas market. Full Article
an Variance Prior Forms for High-Dimensional Bayesian Variable Selection By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Gemma E. Moran, Veronika Ročková, Edward I. George. Source: Bayesian Analysis, Volume 14, Number 4, 1091--1119.Abstract: Consider the problem of high dimensional variable selection for the Gaussian linear model when the unknown error variance is also of interest. In this paper, we show that the use of conjugate shrinkage priors for Bayesian variable selection can have detrimental consequences for such variance estimation. Such priors are often motivated by the invariance argument of Jeffreys (1961). Revisiting this work, however, we highlight a caveat that Jeffreys himself noticed; namely that biased estimators can result from inducing dependence between parameters a priori . In a similar way, we show that conjugate priors for linear regression, which induce prior dependence, can lead to such underestimation in the Bayesian high-dimensional regression setting. Following Jeffreys, we recommend as a remedy to treat regression coefficients and the error variance as independent a priori . Using such an independence prior framework, we extend the Spike-and-Slab Lasso of Ročková and George (2018) to the unknown variance case. This extended procedure outperforms both the fixed variance approach and alternative penalized likelihood methods on simulated data. On the protein activity dataset of Clyde and Parmigiani (1998), the Spike-and-Slab Lasso with unknown variance achieves lower cross-validation error than alternative penalized likelihood methods, demonstrating the gains in predictive accuracy afforded by simultaneous error variance estimation. The unknown variance implementation of the Spike-and-Slab Lasso is provided in the publicly available R package SSLASSO (Ročková and Moran, 2017). Full Article
an Beyond Whittle: Nonparametric Correction of a Parametric Likelihood with a Focus on Bayesian Time Series Analysis By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Claudia Kirch, Matthew C. Edwards, Alexander Meier, Renate Meyer. Source: Bayesian Analysis, Volume 14, Number 4, 1037--1073.Abstract: Nonparametric Bayesian inference has seen a rapid growth over the last decade but only few nonparametric Bayesian approaches to time series analysis have been developed. Most existing approaches use Whittle’s likelihood for Bayesian modelling of the spectral density as the main nonparametric characteristic of stationary time series. It is known that the loss of efficiency using Whittle’s likelihood can be substantial. On the other hand, parametric methods are more powerful than nonparametric methods if the observed time series is close to the considered model class but fail if the model is misspecified. Therefore, we suggest a nonparametric correction of a parametric likelihood that takes advantage of the efficiency of parametric models while mitigating sensitivities through a nonparametric amendment. We use a nonparametric Bernstein polynomial prior on the spectral density with weights induced by a Dirichlet process and prove posterior consistency for Gaussian stationary time series. Bayesian posterior computations are implemented via an MH-within-Gibbs sampler and the performance of the nonparametrically corrected likelihood for Gaussian time series is illustrated in a simulation study and in three astronomy applications, including estimating the spectral density of gravitational wave data from the Advanced Laser Interferometer Gravitational-wave Observatory (LIGO). Full Article
an On the Geometry of Bayesian Inference By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Miguel de Carvalho, Garritt L. Page, Bradley J. Barney. Source: Bayesian Analysis, Volume 14, Number 4, 1013--1036.Abstract: We provide a geometric interpretation to Bayesian inference that allows us to introduce a natural measure of the level of agreement between priors, likelihoods, and posteriors. The starting point for the construction of our geometry is the observation that the marginal likelihood can be regarded as an inner product between the prior and the likelihood. A key concept in our geometry is that of compatibility, a measure which is based on the same construction principles as Pearson correlation, but which can be used to assess how much the prior agrees with the likelihood, to gauge the sensitivity of the posterior to the prior, and to quantify the coherency of the opinions of two experts. Estimators for all the quantities involved in our geometric setup are discussed, which can be directly computed from the posterior simulation output. Some examples are used to illustrate our methods, including data related to on-the-job drug usage, midge wing length, and prostate cancer. Full Article
an A Bayesian Conjugate Gradient Method (with Discussion) By projecteuclid.org Published On :: Mon, 02 Dec 2019 04:00 EST Jon Cockayne, Chris J. Oates, Ilse C.F. Ipsen, Mark Girolami. Source: Bayesian Analysis, Volume 14, Number 3, 937--1012.Abstract: A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this case are of little value unless further information can be provided about, for example, the magnitude of the error. In this paper we propose a novel statistical model for this error, set in a Bayesian framework. Our approach is a strict generalisation of the conjugate gradient method, which is recovered as the posterior mean for a particular choice of prior. The estimates obtained are analysed with Krylov subspace methods and a contraction result for the posterior is presented. The method is then analysed in a simulation study as well as being applied to a challenging problem in medical imaging. Full Article
an Extrinsic Gaussian Processes for Regression and Classification on Manifolds By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Lizhen Lin, Niu Mu, Pokman Cheung, David Dunson. Source: Bayesian Analysis, Volume 14, Number 3, 907--926.Abstract: Gaussian processes (GPs) are very widely used for modeling of unknown functions or surfaces in applications ranging from regression to classification to spatial processes. Although there is an increasingly vast literature on applications, methods, theory and algorithms related to GPs, the overwhelming majority of this literature focuses on the case in which the input domain corresponds to a Euclidean space. However, particularly in recent years with the increasing collection of complex data, it is commonly the case that the input domain does not have such a simple form. For example, it is common for the inputs to be restricted to a non-Euclidean manifold, a case which forms the motivation for this article. In particular, we propose a general extrinsic framework for GP modeling on manifolds, which relies on embedding of the manifold into a Euclidean space and then constructing extrinsic kernels for GPs on their images. These extrinsic Gaussian processes (eGPs) are used as prior distributions for unknown functions in Bayesian inferences. Our approach is simple and general, and we show that the eGPs inherit fine theoretical properties from GP models in Euclidean spaces. We consider applications of our models to regression and classification problems with predictors lying in a large class of manifolds, including spheres, planar shape spaces, a space of positive definite matrices, and Grassmannians. Our models can be readily used by practitioners in biological sciences for various regression and classification problems, such as disease diagnosis or detection. Our work is also likely to have impact in spatial statistics when spatial locations are on the sphere or other geometric spaces. Full Article
an Jointly Robust Prior for Gaussian Stochastic Process in Emulation, Calibration and Variable Selection By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Mengyang Gu. Source: Bayesian Analysis, Volume 14, Number 3, 877--905.Abstract: Gaussian stochastic process (GaSP) has been widely used in two fundamental problems in uncertainty quantification, namely the emulation and calibration of mathematical models. Some objective priors, such as the reference prior, are studied in the context of emulating (approximating) computationally expensive mathematical models. In this work, we introduce a new class of priors, called the jointly robust prior, for both the emulation and calibration. This prior is designed to maintain various advantages from the reference prior. In emulation, the jointly robust prior has an appropriate tail decay rate as the reference prior, and is computationally simpler than the reference prior in parameter estimation. Moreover, the marginal posterior mode estimation with the jointly robust prior can separate the influential and inert inputs in mathematical models, while the reference prior does not have this property. We establish the posterior propriety for a large class of priors in calibration, including the reference prior and jointly robust prior in general scenarios, but the jointly robust prior is preferred because the calibrated mathematical model typically predicts the reality well. The jointly robust prior is used as the default prior in two new R packages, called “RobustGaSP” and “RobustCalibration”, available on CRAN for emulation and calibration, respectively. Full Article
an Bayesian Zero-Inflated Negative Binomial Regression Based on Pólya-Gamma Mixtures By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Brian Neelon. Source: Bayesian Analysis, Volume 14, Number 3, 849--875.Abstract: Motivated by a study examining spatiotemporal patterns in inpatient hospitalizations, we propose an efficient Bayesian approach for fitting zero-inflated negative binomial models. To facilitate posterior sampling, we introduce a set of latent variables that are represented as scale mixtures of normals, where the precision terms follow independent Pólya-Gamma distributions. Conditional on the latent variables, inference proceeds via straightforward Gibbs sampling. For fixed-effects models, our approach is comparable to existing methods. However, our model can accommodate more complex data structures, including multivariate and spatiotemporal data, settings in which current approaches often fail due to computational challenges. Using simulation studies, we highlight key features of the method and compare its performance to other estimation procedures. We apply the approach to a spatiotemporal analysis examining the number of annual inpatient admissions among United States veterans with type 2 diabetes. Full Article
an High-Dimensional Confounding Adjustment Using Continuous Spike and Slab Priors By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Joseph Antonelli, Giovanni Parmigiani, Francesca Dominici. Source: Bayesian Analysis, Volume 14, Number 3, 825--848.Abstract: In observational studies, estimation of a causal effect of a treatment on an outcome relies on proper adjustment for confounding. If the number of the potential confounders ( $p$ ) is larger than the number of observations ( $n$ ), then direct control for all potential confounders is infeasible. Existing approaches for dimension reduction and penalization are generally aimed at predicting the outcome, and are less suited for estimation of causal effects. Under standard penalization approaches (e.g. Lasso), if a variable $X_{j}$ is strongly associated with the treatment $T$ but weakly with the outcome $Y$ , the coefficient $eta_{j}$ will be shrunk towards zero thus leading to confounding bias. Under the assumption of a linear model for the outcome and sparsity, we propose continuous spike and slab priors on the regression coefficients $eta_{j}$ corresponding to the potential confounders $X_{j}$ . Specifically, we introduce a prior distribution that does not heavily shrink to zero the coefficients ( $eta_{j}$ s) of the $X_{j}$ s that are strongly associated with $T$ but weakly associated with $Y$ . We compare our proposed approach to several state of the art methods proposed in the literature. Our proposed approach has the following features: 1) it reduces confounding bias in high dimensional settings; 2) it shrinks towards zero coefficients of instrumental variables; and 3) it achieves good coverages even in small sample sizes. We apply our approach to the National Health and Nutrition Examination Survey (NHANES) data to estimate the causal effects of persistent pesticide exposure on triglyceride levels. Full Article
an Probability Based Independence Sampler for Bayesian Quantitative Learning in Graphical Log-Linear Marginal Models By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Ioannis Ntzoufras, Claudia Tarantola, Monia Lupparelli. Source: Bayesian Analysis, Volume 14, Number 3, 797--823.Abstract: We introduce a novel Bayesian approach for quantitative learning for graphical log-linear marginal models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. The likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and Markov Chain Monte Carlo (MCMC) methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for such models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. We exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset. Full Article
an Stochastic Approximations to the Pitman–Yor Process By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Julyan Arbel, Pierpaolo De Blasi, Igor Prünster. Source: Bayesian Analysis, Volume 14, Number 3, 753--771.Abstract: In this paper we consider approximations to the popular Pitman–Yor process obtained by truncating the stick-breaking representation. The truncation is determined by a random stopping rule that achieves an almost sure control on the approximation error in total variation distance. We derive the asymptotic distribution of the random truncation point as the approximation error $epsilon$ goes to zero in terms of a polynomially tilted positive stable random variable. The practical usefulness and effectiveness of this theoretical result is demonstrated by devising a sampling algorithm to approximate functionals of the $epsilon$ -version of the Pitman–Yor process. Full Article
an Semiparametric Multivariate and Multiple Change-Point Modeling By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Stefano Peluso, Siddhartha Chib, Antonietta Mira. Source: Bayesian Analysis, Volume 14, Number 3, 727--751.Abstract: We develop a general Bayesian semiparametric change-point model in which separate groups of structural parameters (for example, location and dispersion parameters) can each follow a separate multiple change-point process, driven by time-dependent transition matrices among the latent regimes. The distribution of the observations within regimes is unknown and given by a Dirichlet process mixture prior. The properties of the proposed model are studied theoretically through the analysis of inter-arrival times and of the number of change-points in a given time interval. The prior-posterior analysis by Markov chain Monte Carlo techniques is developed on a forward-backward algorithm for sampling the various regime indicators. Analysis with simulated data under various scenarios and an application to short-term interest rates are used to show the generality and usefulness of the proposed model. Full Article
an A Bayesian Nonparametric Multiple Testing Procedure for Comparing Several Treatments Against a Control By projecteuclid.org Published On :: Fri, 31 May 2019 22:05 EDT Luis Gutiérrez, Andrés F. Barrientos, Jorge González, Daniel Taylor-Rodríguez. Source: Bayesian Analysis, Volume 14, Number 2, 649--675.Abstract: We propose a Bayesian nonparametric strategy to test for differences between a control group and several treatment regimes. Most of the existing tests for this type of comparison are based on the differences between location parameters. In contrast, our approach identifies differences across the entire distribution, avoids strong modeling assumptions over the distributions for each treatment, and accounts for multiple testing through the prior distribution on the space of hypotheses. The proposal is compared to other commonly used hypothesis testing procedures under simulated scenarios. Two real applications are also analyzed with the proposed methodology. Full Article
an Efficient Acquisition Rules for Model-Based Approximate Bayesian Computation By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Marko Järvenpää, Michael U. Gutmann, Arijus Pleska, Aki Vehtari, Pekka Marttinen. Source: Bayesian Analysis, Volume 14, Number 2, 595--622.Abstract: Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next but common BO strategies are not designed for the goal of estimating the posterior distribution. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to a lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies. Full Article
an Fast Model-Fitting of Bayesian Variable Selection Regression Using the Iterative Complex Factorization Algorithm By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Quan Zhou, Yongtao Guan. Source: Bayesian Analysis, Volume 14, Number 2, 573--594.Abstract: Bayesian variable selection regression (BVSR) is able to jointly analyze genome-wide genetic datasets, but the slow computation via Markov chain Monte Carlo (MCMC) hampered its wide-spread usage. Here we present a novel iterative method to solve a special class of linear systems, which can increase the speed of the BVSR model-fitting tenfold. The iterative method hinges on the complex factorization of the sum of two matrices and the solution path resides in the complex domain (instead of the real domain). Compared to the Gauss-Seidel method, the complex factorization converges almost instantaneously and its error is several magnitude smaller than that of the Gauss-Seidel method. More importantly, the error is always within the pre-specified precision while the Gauss-Seidel method is not. For large problems with thousands of covariates, the complex factorization is 10–100 times faster than either the Gauss-Seidel method or the direct method via the Cholesky decomposition. In BVSR, one needs to repetitively solve large penalized regression systems whose design matrices only change slightly between adjacent MCMC steps. This slight change in design matrix enables the adaptation of the iterative complex factorization method. The computational innovation will facilitate the wide-spread use of BVSR in reanalyzing genome-wide association datasets. Full Article
an A Bayesian Nonparametric Spiked Process Prior for Dynamic Model Selection By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Alberto Cassese, Weixuan Zhu, Michele Guindani, Marina Vannucci. Source: Bayesian Analysis, Volume 14, Number 2, 553--572.Abstract: In many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or “normal” behavior. In this manuscript, we consider the monitoring of pneumonia and influenza (P&I) mortality, to detect influenza outbreaks in the continental United States, and propose a Bayesian nonparametric model selection approach to take into account the spatio-temporal dependence of outbreaks. More specifically, we introduce a zero-inflated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random field prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching based models, and with spike-and-slab Bayesian nonparametric priors that do not take into account spatio-temporal dependence. Full Article
an Constrained Bayesian Optimization with Noisy Experiments By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Benjamin Letham, Brian Karrer, Guilherme Ottoni, Eytan Bakshy. Source: Bayesian Analysis, Volume 14, Number 2, 495--519.Abstract: Randomized experiments are the gold standard for evaluating the effects of changes to real-world systems. Data in these tests may be difficult to collect and outcomes may have high variance, resulting in potentially large measurement error. Bayesian optimization is a promising technique for efficiently optimizing multiple continuous parameters, but existing approaches degrade in performance when the noise level is high, limiting its applicability to many randomized experiments. We derive an expression for expected improvement under greedy batch optimization with noisy observations and noisy constraints, and develop a quasi-Monte Carlo approximation that allows it to be efficiently optimized. Simulations with synthetic functions show that optimization performance on noisy, constrained problems outperforms existing methods. We further demonstrate the effectiveness of the method with two real-world experiments conducted at Facebook: optimizing a ranking system, and optimizing server compiler flags. Full Article
an Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Łukasz Rajkowski. Source: Bayesian Analysis, Volume 14, Number 2, 477--494.Abstract: Mixture models are a natural choice in many applications, but it can be difficult to place an a priori upper bound on the number of components. To circumvent this, investigators are turning increasingly to Dirichlet process mixture models (DPMMs). It is therefore important to develop an understanding of the strengths and weaknesses of this approach. This work considers the MAP (maximum a posteriori) clustering for the Gaussian DPMM (where the cluster means have Gaussian distribution and, for each cluster, the observations within the cluster have Gaussian distribution). Some desirable properties of the MAP partition are proved: ‘almost disjointness’ of the convex hulls of clusters (they may have at most one point in common) and (with natural assumptions) the comparability of sizes of those clusters that intersect any fixed ball with the number of observations (as the latter goes to infinity). Consequently, the number of such clusters remains bounded. Furthermore, if the data arises from independent identically distributed sampling from a given distribution with bounded support then the asymptotic MAP partition of the observation space maximises a function which has a straightforward expression, which depends only on the within-group covariance parameter. As the operator norm of this covariance parameter decreases, the number of clusters in the MAP partition becomes arbitrarily large, which may lead to the overestimation of the number of mixture components. Full Article
an Efficient Bayesian Regularization for Graphical Model Selection By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Suprateek Kundu, Bani K. Mallick, Veera Baladandayuthapani. Source: Bayesian Analysis, Volume 14, Number 2, 449--476.Abstract: There has been an intense development in the Bayesian graphical model literature over the past decade; however, most of the existing methods are restricted to moderate dimensions. We propose a novel graphical model selection approach for large dimensional settings where the dimension increases with the sample size, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under a novel class of mixtures of inverse–Wishart priors, which induce shrinkage on the precision matrix under an equivalence with Cholesky-based regularization, while enabling conjugate updates. Subsequently, a post-fitting model selection step uses penalized joint credible regions to perform model selection. This allows our methods to be computationally feasible for large dimensional settings using a combination of straightforward Gibbs samplers and efficient post-fitting inferences. Theoretical guarantees in terms of selection consistency are also established. Simulations show that the proposed approach compares favorably with competing methods, both in terms of accuracy metrics and computation times. We apply this approach to a cancer genomics data example. Full Article
an A Bayesian Approach to Statistical Shape Analysis via the Projected Normal Distribution By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Luis Gutiérrez, Eduardo Gutiérrez-Peña, Ramsés H. Mena. Source: Bayesian Analysis, Volume 14, Number 2, 427--447.Abstract: This work presents a Bayesian predictive approach to statistical shape analysis. A modeling strategy that starts with a Gaussian distribution on the configuration space, and then removes the effects of location, rotation and scale, is studied. This boils down to an application of the projected normal distribution to model the configurations in the shape space, which together with certain identifiability constraints, facilitates parameter interpretation. Having better control over the parameters allows us to generalize the model to a regression setting where the effect of predictors on shapes can be considered. The methodology is illustrated and tested using both simulated scenarios and a real data set concerning eight anatomical landmarks on a sagittal plane of the corpus callosum in patients with autism and in a group of controls. Full Article
an Control of Type I Error Rates in Bayesian Sequential Designs By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Haolun Shi, Guosheng Yin. Source: Bayesian Analysis, Volume 14, Number 2, 399--425.Abstract: Bayesian approaches to phase II clinical trial designs are usually based on the posterior distribution of the parameter of interest and calibration of certain threshold for decision making. If the posterior probability is computed and assessed in a sequential manner, the design may involve the problem of multiplicity, which, however, is often a neglected aspect in Bayesian trial designs. To effectively maintain the overall type I error rate, we propose solutions to the problem of multiplicity for Bayesian sequential designs and, in particular, the determination of the cutoff boundaries for the posterior probabilities. We present both theoretical and numerical methods for finding the optimal posterior probability boundaries with $alpha$ -spending functions that mimic those of the frequentist group sequential designs. The theoretical approach is based on the asymptotic properties of the posterior probability, which establishes a connection between the Bayesian trial design and the frequentist group sequential method. The numerical approach uses a sandwich-type searching algorithm, which immensely reduces the computational burden. We apply least-square fitting to find the $alpha$ -spending function closest to the target. We discuss the application of our method to single-arm and double-arm cases with binary and normal endpoints, respectively, and provide a real trial example for each case. Full Article
an Bayesian Effect Fusion for Categorical Predictors By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Daniela Pauger, Helga Wagner. Source: Bayesian Analysis, Volume 14, Number 2, 341--369.Abstract: We propose a Bayesian approach to obtain a sparse representation of the effect of a categorical predictor in regression type models. As this effect is captured by a group of level effects, sparsity cannot only be achieved by excluding single irrelevant level effects or the whole group of effects associated to this predictor but also by fusing levels which have essentially the same effect on the response. To achieve this goal, we propose a prior which allows for almost perfect as well as almost zero dependence between level effects a priori. This prior can alternatively be obtained by specifying spike and slab prior distributions on all effect differences associated to this categorical predictor. We show how restricted fusion can be implemented and develop an efficient MCMC (Markov chain Monte Carlo) method for posterior computation. The performance of the proposed method is investigated on simulated data and we illustrate its application on real data from EU-SILC (European Union Statistics on Income and Living Conditions). Full Article
an Separable covariance arrays via the Tucker product, with applications to multivariate relational data By projecteuclid.org Published On :: Wed, 13 Jun 2012 14:27 EDT Peter D. HoffSource: Bayesian Anal., Volume 6, Number 2, 179--196.Abstract: Modern datasets are often in the form of matrices or arrays, potentially having correlations along each set of data indices. For example, data involving repeated measurements of several variables over time may exhibit temporal correlation as well as correlation among the variables. A possible model for matrix-valued data is the class of matrix normal distributions, which is parametrized by two covariance matrices, one for each index set of the data. In this article we discuss an extension of the matrix normal model to accommodate multidimensional data arrays, or tensors. We show how a particular array-matrix product can be used to generate the class of array normal distributions having separable covariance structure. We derive some properties of these covariance structures and the corresponding array normal distributions, and show how the array-matrix product can be used to define a semi-conjugate prior distribution and calculate the corresponding posterior distribution. We illustrate the methodology in an analysis of multivariate longitudinal network data which take the form of a four-way array. Full Article
an Maximum Independent Component Analysis with Application to EEG Data By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Ruosi Guo, Chunming Zhang, Zhengjun Zhang. Source: Statistical Science, Volume 35, Number 1, 145--157.Abstract: In many scientific disciplines, finding hidden influential factors behind observational data is essential but challenging. The majority of existing approaches, such as the independent component analysis (${mathrm{ICA}}$), rely on linear transformation, that is, true signals are linear combinations of hidden components. Motivated from analyzing nonlinear temporal signals in neuroscience, genetics, and finance, this paper proposes the “maximum independent component analysis” (${mathrm{MaxICA}}$), based on max-linear combinations of components. In contrast to existing methods, ${mathrm{MaxICA}}$ benefits from focusing on significant major components while filtering out ignorable components. A major tool for parameter learning of ${mathrm{MaxICA}}$ is an augmented genetic algorithm, consisting of three schemes for the elite weighted sum selection, randomly combined crossover, and dynamic mutation. Extensive empirical evaluations demonstrate the effectiveness of ${mathrm{MaxICA}}$ in either extracting max-linearly combined essential sources in many applications or supplying a better approximation for nonlinearly combined source signals, such as $mathrm{EEG}$ recordings analyzed in this paper. Full Article
an Statistical Inference for the Evolutionary History of Cancer Genomes By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Khanh N. Dinh, Roman Jaksik, Marek Kimmel, Amaury Lambert, Simon Tavaré. Source: Statistical Science, Volume 35, Number 1, 129--144.Abstract: Recent years have seen considerable work on inference about cancer evolution from mutations identified in cancer samples. Much of the modeling work has been based on classical models of population genetics, generalized to accommodate time-varying cell population size. Reverse-time, genealogical views of such models, commonly known as coalescents, have been used to infer aspects of the past of growing populations. Another approach is to use branching processes, the simplest scenario being the classical linear birth-death process. Inference from evolutionary models of DNA often exploits summary statistics of the sequence data, a common one being the so-called Site Frequency Spectrum (SFS). In a bulk tumor sequencing experiment, we can estimate for each site at which a novel somatic point mutation has arisen, the proportion of cells that carry that mutation. These numbers are then grouped into collections of sites which have similar mutant fractions. We examine how the SFS based on birth-death processes differs from those based on the coalescent model. This may stem from the different sampling mechanisms in the two approaches. However, we also show that despite this, they are quantitatively comparable for the range of parameters typical for tumor cell populations. We also present a model of tumor evolution with selective sweeps, and demonstrate how it may help in understanding the history of a tumor as well as the influence of data pre-processing. We illustrate the theory with applications to several examples from The Cancer Genome Atlas tumors. Full Article
an Data Denoising and Post-Denoising Corrections in Single Cell RNA Sequencing By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Divyansh Agarwal, Jingshu Wang, Nancy R. Zhang. Source: Statistical Science, Volume 35, Number 1, 112--128.Abstract: Single cell sequencing technologies are transforming biomedical research. However, due to the inherent nature of the data, single cell RNA sequencing analysis poses new computational and statistical challenges. We begin with a survey of a selection of topics in this field, with a gentle introduction to the biology and a more detailed exploration of the technical noise. We consider in detail the problem of single cell data denoising, sometimes referred to as “imputation” in the relevant literature. We discuss why this is not a typical statistical imputation problem, and review current approaches to this problem. We then explore why the use of denoised values in downstream analyses invites novel statistical insights, and how denoising uncertainty should be accounted for to yield valid statistical inference. The utilization of denoised or imputed matrices in statistical inference is not unique to single cell genomics, and arises in many other fields. We describe the challenges in this type of analysis, discuss some preliminary solutions, and highlight unresolved issues. Full Article
an Statistical Molecule Counting in Super-Resolution Fluorescence Microscopy: Towards Quantitative Nanoscopy By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Thomas Staudt, Timo Aspelmeier, Oskar Laitenberger, Claudia Geisler, Alexander Egner, Axel Munk. Source: Statistical Science, Volume 35, Number 1, 92--111.Abstract: Super-resolution microscopy is rapidly gaining importance as an analytical tool in the life sciences. A compelling feature is the ability to label biological units of interest with fluorescent markers in (living) cells and to observe them with considerably higher resolution than conventional microscopy permits. The images obtained this way, however, lack an absolute intensity scale in terms of numbers of fluorophores observed. In this article, we discuss state of the art methods to count such fluorophores and statistical challenges that come along with it. In particular, we suggest a modeling scheme for time series generated by single-marker-switching (SMS) microscopy that makes it possible to quantify the number of markers in a statistically meaningful manner from the raw data. To this end, we model the entire process of photon generation in the fluorophore, their passage through the microscope, detection and photoelectron amplification in the camera, and extraction of time series from the microscopic images. At the heart of these modeling steps is a careful description of the fluorophore dynamics by a novel hidden Markov model that operates on two timescales (HTMM). Besides the fluorophore number, information about the kinetic transition rates of the fluorophore’s internal states is also inferred during estimation. We comment on computational issues that arise when applying our model to simulated or measured fluorescence traces and illustrate our methodology on simulated data. Full Article
an Quantum Science and Quantum Technology By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Yazhen Wang, Xinyu Song. Source: Statistical Science, Volume 35, Number 1, 51--74.Abstract: Quantum science and quantum technology are of great current interest in multiple frontiers of many scientific fields ranging from computer science to physics and chemistry, and from engineering to mathematics and statistics. Their developments will likely lead to a new wave of scientific revolutions and technological innovations in a wide range of scientific studies and applications. This paper provides a brief review on quantum communication, quantum information, quantum computation, quantum simulation, and quantum metrology. We present essential quantum properties, illustrate relevant concepts of quantum science and quantum technology, and discuss their scientific developments. We point out the need for statistical analysis in their developments, as well as their potential applications to and impacts on statistics and data science. Full Article
an Risk Models for Breast Cancer and Their Validation By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Adam R. Brentnall, Jack Cuzick. Source: Statistical Science, Volume 35, Number 1, 14--30.Abstract: Strategies to prevent cancer and diagnose it early when it is most treatable are needed to reduce the public health burden from rising disease incidence. Risk assessment is playing an increasingly important role in targeting individuals in need of such interventions. For breast cancer many individual risk factors have been well understood for a long time, but the development of a fully comprehensive risk model has not been straightforward, in part because there have been limited data where joint effects of an extensive set of risk factors may be estimated with precision. In this article we first review the approach taken to develop the IBIS (Tyrer–Cuzick) model, and describe recent updates. We then review and develop methods to assess calibration of models such as this one, where the risk of disease allowing for competing mortality over a long follow-up time or lifetime is estimated. The breast cancer risk model model and calibration assessment methods are demonstrated using a cohort of 132,139 women attending mammography screening in the State of Washington, USA. Full Article
an Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Zhixiang Lin, Mahdi Zamanighomi, Timothy Daley, Shining Ma, Wing Hung Wong. Source: Statistical Science, Volume 35, Number 1, 2--13.Abstract: Unsupervised methods, including clustering methods, are essential to the analysis of single-cell genomic data. Model-based clustering methods are under-explored in the area of single-cell genomics, and have the advantage of quantifying the uncertainty of the clustering result. Here we develop a model-based approach for the integrative analysis of single-cell chromatin accessibility and gene expression data. We show that combining these two types of data, we can achieve a better separation of the underlying cell types. An efficient Markov chain Monte Carlo algorithm is also developed. Full Article
an Gaussianization Machines for Non-Gaussian Function Estimation Models By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST T. Tony Cai. Source: Statistical Science, Volume 34, Number 4, 635--656.Abstract: A wide range of nonparametric function estimation models have been studied individually in the literature. Among them the homoscedastic nonparametric Gaussian regression is arguably the best known and understood. Inspired by the asymptotic equivalence theory, Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046) and Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433) developed a unified approach to turn a collection of non-Gaussian function estimation models into a standard Gaussian regression and any good Gaussian nonparametric regression method can then be used. These Gaussianization Machines have two key components, binning and transformation. When combined with BlockJS, a wavelet thresholding procedure for Gaussian regression, the procedures are computationally efficient with strong theoretical guarantees. Technical analysis given in Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046) and Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433) shows that the estimators attain the optimal rate of convergence adaptively over a large set of Besov spaces and across a collection of non-Gaussian function estimation models, including robust nonparametric regression, density estimation, and nonparametric regression in exponential families. The estimators are also spatially adaptive. The Gaussianization Machines significantly extend the flexibility and scope of the theories and methodologies originally developed for the conventional nonparametric Gaussian regression. This article aims to provide a concise account of the Gaussianization Machines developed in Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046), Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433). Full Article
an Larry Brown’s Contributions to Parametric Inference, Decision Theory and Foundations: A Survey By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST James O. Berger, Anirban DasGupta. Source: Statistical Science, Volume 34, Number 4, 621--634.Abstract: This article gives a panoramic survey of the general area of parametric statistical inference, decision theory and foundations of statistics for the period 1965–2010 through the lens of Larry Brown’s contributions to varied aspects of this massive area. The article goes over sufficiency, shrinkage estimation, admissibility, minimaxity, complete class theorems, estimated confidence, conditional confidence procedures, Edgeworth and higher order asymptotic expansions, variational Bayes, Stein’s SURE, differential inequalities, geometrization of convergence rates, asymptotic equivalence, aspects of empirical process theory, inference after model selection, unified frequentist and Bayesian testing, and Wald’s sequential theory. A reasonably comprehensive bibliography is provided. Full Article
an Comment: “Models as Approximations I: Consequences Illustrated with Linear Regression” by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, L. Zhan and K. Zhang By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Roderick J. Little. Source: Statistical Science, Volume 34, Number 4, 580--583. Full Article
an Comment on Models as Approximations, Parts I and II, by Buja et al. By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Jerald F. Lawless. Source: Statistical Science, Volume 34, Number 4, 569--571.Abstract: I comment on the papers Models as Approximations I and II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zhao and K. Zhang. Full Article
an Assessing the Causal Effect of Binary Interventions from Observational Panel Data with Few Treated Units By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Pantelis Samartsidis, Shaun R. Seaman, Anne M. Presanis, Matthew Hickman, Daniela De Angelis. Source: Statistical Science, Volume 34, Number 3, 486--503.Abstract: Researchers are often challenged with assessing the impact of an intervention on an outcome of interest in situations where the intervention is nonrandomised, the intervention is only applied to one or few units, the intervention is binary, and outcome measurements are available at multiple time points. In this paper, we review existing methods for causal inference in these situations. We detail the assumptions underlying each method, emphasize connections between the different approaches and provide guidelines regarding their practical implementation. Several open problems are identified thus highlighting the need for future research. Full Article
an Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Daniele Durante, Tommaso Rigon. Source: Statistical Science, Volume 34, Number 3, 472--485.Abstract: Variational Bayes (VB) is a common strategy for approximate Bayesian inference, but simple methods are only available for specific classes of models including, in particular, representations having conditionally conjugate constructions within an exponential family. Models with logit components are an apparently notable exception to this class, due to the absence of conjugacy among the logistic likelihood and the Gaussian priors for the coefficients in the linear predictor. To facilitate approximate inference within this widely used class of models, Jaakkola and Jordan ( Stat. Comput. 10 (2000) 25–37) proposed a simple variational approach which relies on a family of tangent quadratic lower bounds of the logistic log-likelihood, thus restoring conjugacy between these approximate bounds and the Gaussian priors. This strategy is still implemented successfully, but few attempts have been made to formally understand the reasons underlying its excellent performance. Following a review on VB for logistic models, we cover this gap by providing a formal connection between the above bound and a recent Pólya-gamma data augmentation for logistic regression. Such a result places the computational methods associated with the aforementioned bounds within the framework of variational inference for conditionally conjugate exponential family models, thereby allowing recent advances for this class to be inherited also by the methods relying on Jaakkola and Jordan ( Stat. Comput. 10 (2000) 25–37). Full Article
an User-Friendly Covariance Estimation for Heavy-Tailed Distributions By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Yuan Ke, Stanislav Minsker, Zhao Ren, Qiang Sun, Wen-Xin Zhou. Source: Statistical Science, Volume 34, Number 3, 454--471.Abstract: We provide a survey of recent results on covariance estimation for heavy-tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce elementwise and spectrumwise truncation operators, as well as their $M$-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key insight is that estimators should adapt to the sample size, dimensionality and noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate practical implementation, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods. Full Article
an An Overview of Semiparametric Extensions of Finite Mixture Models By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Sijia Xiang, Weixin Yao, Guangren Yang. Source: Statistical Science, Volume 34, Number 3, 391--404.Abstract: Finite mixture models have offered a very important tool for exploring complex data structures in many scientific areas, such as economics, epidemiology and finance. Semiparametric mixture models, which were introduced into traditional finite mixture models in the past decade, have brought forth exciting developments in their methodologies, theories, and applications. In this article, we not only provide a selective overview of the newly-developed semiparametric mixture models, but also discuss their estimation methodologies, theoretical properties if applicable, and some open questions. Recent developments are also discussed. Full Article
an Two-Sample Instrumental Variable Analyses Using Heterogeneous Samples By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Qingyuan Zhao, Jingshu Wang, Wes Spiller, Jack Bowden, Dylan S. Small. Source: Statistical Science, Volume 34, Number 2, 317--333.Abstract: Instrumental variable analysis is a widely used method to estimate causal effects in the presence of unmeasured confounding. When the instruments, exposure and outcome are not measured in the same sample, Angrist and Krueger ( J. Amer. Statist. Assoc. 87 (1992) 328–336) suggested to use two-sample instrumental variable (TSIV) estimators that use sample moments from an instrument-exposure sample and an instrument-outcome sample. However, this method is biased if the two samples are from heterogeneous populations so that the distributions of the instruments are different. In linear structural equation models, we derive a new class of TSIV estimators that are robust to heterogeneous samples under the key assumption that the structural relations in the two samples are the same. The widely used two-sample two-stage least squares estimator belongs to this class. It is generally not asymptotically efficient, although we find that it performs similarly to the optimal TSIV estimator in most practical situations. We then attempt to relax the linearity assumption. We find that, unlike one-sample analyses, the TSIV estimator is not robust to misspecified exposure model. Additionally, to nonparametrically identify the magnitude of the causal effect, the noise in the exposure must have the same distributions in the two samples. However, this assumption is in general untestable because the exposure is not observed in one sample. Nonetheless, we may still identify the sign of the causal effect in the absence of homogeneity of the noise. Full Article
an Producing Official County-Level Agricultural Estimates in the United States: Needs and Challenges By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Nathan B. Cruze, Andreea L. Erciulescu, Balgobin Nandram, Wendy J. Barboza, Linda J. Young. Source: Statistical Science, Volume 34, Number 2, 301--316.Abstract: In the United States, county-level estimates of crop yield, production, and acreage published by the United States Department of Agriculture’s National Agricultural Statistics Service (USDA NASS) play an important role in determining the value of payments allotted to farmers and ranchers enrolled in several federal programs. Given the importance of these official county-level crop estimates, NASS continually strives to improve its crops county estimates program in terms of accuracy, reliability and coverage. In 2015, NASS engaged a panel of experts convened under the auspices of the National Academies of Sciences, Engineering, and Medicine Committee on National Statistics (CNSTAT) for guidance on implementing models that may synthesize multiple sources of information into a single estimate, provide defensible measures of uncertainty, and potentially increase the number of publishable county estimates. The final report titled Improving Crop Estimates by Integrating Multiple Data Sources was released in 2017. This paper discusses several needs and requirements for NASS county-level crop estimates that were illuminated during the activities of the CNSTAT panel. A motivating example of planted acreage estimation in Illinois illustrates several challenges that NASS faces as it considers adopting any explicit model for official crops county estimates. Full Article
an The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015 By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Laura Anderlucci, Angela Montanari, Cinzia Viroli. Source: Statistical Science, Volume 34, Number 2, 280--300.Abstract: In this paper, we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: The Annals of Statistics , Biometrika , Journal of the American Statistical Association , Journal of the Royal Statistical Society, Series B and Statistical Science . The aim is to construct a kind of “taxonomy” of the statistical papers by organizing and clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data. Full Article
an Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Lei Liu, Ya-Chen Tina Shih, Robert L. Strawderman, Daowen Zhang, Bankole A. Johnson, Haitao Chai. Source: Statistical Science, Volume 34, Number 2, 253--279.Abstract: Zero-inflated nonnegative continuous (or semicontinuous) data arise frequently in biomedical, economical, and ecological studies. Examples include substance abuse, medical costs, medical care utilization, biomarkers (e.g., CD4 cell counts, coronary artery calcium scores), single cell gene expression rates, and (relative) abundance of microbiome. Such data are often characterized by the presence of a large portion of zero values and positive continuous values that are skewed to the right and heteroscedastic. Both of these features suggest that no simple parametric distribution may be suitable for modeling such type of outcomes. In this paper, we review statistical methods for analyzing zero-inflated nonnegative outcome data. We will start with the cross-sectional setting, discussing ways to separate zero and positive values and introducing flexible models to characterize right skewness and heteroscedasticity in the positive values. We will then present models of correlated zero-inflated nonnegative continuous data, using random effects to tackle the correlation on repeated measures from the same subject and that across different parts of the model. We will also discuss expansion to related topics, for example, zero-inflated count and survival data, nonlinear covariate effects, and joint models of longitudinal zero-inflated nonnegative continuous data and survival. Finally, we will present applications to three real datasets (i.e., microbiome, medical costs, and alcohol drinking) to illustrate these methods. Example code will be provided to facilitate applications of these methods. Full Article
an A Kernel Regression Procedure in the 3D Shape Space with an Application to Online Sales of Children’s Wear By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Gregorio Quintana-Ortí, Amelia Simó. Source: Statistical Science, Volume 34, Number 2, 236--252.Abstract: This paper is focused on kernel regression when the response variable is the shape of a 3D object represented by a configuration matrix of landmarks. Regression methods on this shape space are not trivial because this space has a complex finite-dimensional Riemannian manifold structure (non-Euclidean). Papers about it are scarce in the literature, the majority of them are restricted to the case of a single explanatory variable, and many of them are based on the approximated tangent space. In this paper, there are several methodological innovations. The first one is the adaptation of the general method for kernel regression analysis in manifold-valued data to the three-dimensional case of Kendall’s shape space. The second one is its generalization to the multivariate case and the addressing of the curse-of-dimensionality problem. Finally, we propose bootstrap confidence intervals for prediction. A simulation study is carried out to check the goodness of the procedure, and a comparison with a current approach is performed. Then, it is applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children’s wear. Full Article
an Rejoinder: Bayes, Oracle Bayes, and Empirical Bayes By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Bradley Efron. Source: Statistical Science, Volume 34, Number 2, 234--235. Full Article
an Comment: Empirical Bayes, Compound Decisions and Exchangeability By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Eitan Greenshtein, Ya’acov Ritov. Source: Statistical Science, Volume 34, Number 2, 224--228.Abstract: We present some personal reflections on empirical Bayes/ compound decision (EB/CD) theory following Efron (2019). In particular, we consider the role of exchangeability in the EB/CD theory and how it can be achieved when there are covariates. We also discuss the interpretation of EB/CD confidence interval, the theoretical efficiency of the CD procedure, and the impact of sparsity assumptions. Full Article
an Comment: Bayes, Oracle Bayes and Empirical Bayes By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Aad van der Vaart. Source: Statistical Science, Volume 34, Number 2, 214--218. Full Article
an Comment: Bayes, Oracle Bayes, and Empirical Bayes By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Nan Laird. Source: Statistical Science, Volume 34, Number 2, 206--208. Full Article