el Boeing says it's about to start building the 737 Max plane again in the middle of the coronavirus pandemic, even though it already has more planes than it can deliver By news.yahoo.com Published On :: Fri, 08 May 2020 12:44:06 -0400 Boeing CEO Dave Calhoun said the company was aiming to resume production this month, despite the ongoing grounding and coronavirus pandemic. Full Article
el Delta, citing health concerns, drops service to 10 US airports. Is yours on the list? By news.yahoo.com Published On :: Fri, 08 May 2020 18:41:45 -0400 Delta said it is making the move to protect employees amid the coronavirus pandemic, but planes have been flying near empty Full Article
el ‘Selfish, tribal and divided’: Barack Obama warns of changes to American way of life in leaked audio slamming Trump administration By news.yahoo.com Published On :: Sat, 09 May 2020 07:22:00 -0400 Barack Obama said the “rule of law is at risk” following the justice department’s decision to drop charges against former Trump advisor Mike Flynn, as he issued a stark warning about the long-term impact on the American way of life by his successor. Full Article
el The McMichaels can't be charged with a hate crime by the state in the shooting death of Ahmaud Arbery because the law doesn't exist in Georgia By news.yahoo.com Published On :: Fri, 08 May 2020 17:07:36 -0400 Georgia is one of four states that doesn't have a hate crime law. Arbery's killing has reignited calls for legislation. Full Article
el Nearly one-third of Americans believe a coronavirus vaccine exists and is being withheld, survey finds By news.yahoo.com Published On :: Fri, 08 May 2020 16:49:35 -0400 The Democracy Fund + UCLA Nationscape Project found some misinformation about the coronavirus is more widespread that you might think. Full Article
el A Loss-Based Prior for Variable Selection in Linear Regression Methods By projecteuclid.org Published On :: Thu, 19 Mar 2020 22:02 EDT Cristiano Villa, Jeong Eun Lee. Source: Bayesian Analysis, Volume 15, Number 2, 533--558.Abstract: In this work we propose a novel model prior for variable selection in linear regression. The idea is to determine the prior mass by considering the worth of each of the regression models, given the number of possible covariates under consideration. The worth of a model consists of the information loss and the loss due to model complexity. While the information loss is determined objectively, the loss expression due to model complexity is flexible and, the penalty on model size can be even customized to include some prior knowledge. Some versions of the loss-based prior are proposed and compared empirically. Through simulation studies and real data analyses, we compare the proposed prior to the Scott and Berger prior, for noninformative scenarios, and with the Beta-Binomial prior, for informative scenarios. Full Article
el Joint Modeling of Longitudinal Relational Data and Exogenous Variables By projecteuclid.org Published On :: Thu, 19 Mar 2020 22:02 EDT Rajarshi Guhaniyogi, Abel Rodriguez. Source: Bayesian Analysis, Volume 15, Number 2, 477--503.Abstract: This article proposes a framework based on shared, time varying stochastic latent factor models for modeling relational data in which network and node-attributes co-evolve over time. Our proposed framework is flexible enough to handle both categorical and continuous attributes, allows us to estimate the dimension of the latent social space, and automatically yields Bayesian hypothesis tests for the association between network structure and nodal attributes. Additionally, the model is easy to compute and readily yields inference and prediction for missing link between nodes. We employ our model framework to study co-evolution of international relations between 22 countries and the country specific indicators over a period of 11 years. Full Article
el Bayesian Inference in Nonparanormal Graphical Models By projecteuclid.org Published On :: Thu, 19 Mar 2020 22:02 EDT Jami J. Mulgrave, Subhashis Ghosal. Source: Bayesian Analysis, Volume 15, Number 2, 449--475.Abstract: Gaussian graphical models have been used to study intrinsic dependence among several variables, but the Gaussianity assumption may be restrictive in many applications. A nonparanormal graphical model is a semiparametric generalization for continuous variables where it is assumed that the variables follow a Gaussian graphical model only after some unknown smooth monotone transformations on each of them. We consider a Bayesian approach in the nonparanormal graphical model by putting priors on the unknown transformations through a random series based on B-splines where the coefficients are ordered to induce monotonicity. A truncated normal prior leads to partial conjugacy in the model and is useful for posterior simulation using Gibbs sampling. On the underlying precision matrix of the transformed variables, we consider a spike-and-slab prior and use an efficient posterior Gibbs sampling scheme. We use the Bayesian Information Criterion to choose the hyperparameters for the spike-and-slab prior. We present a posterior consistency result on the underlying transformation and the precision matrix. We study the numerical performance of the proposed method through an extensive simulation study and finally apply the proposed method on a real data set. Full Article
el Additive Multivariate Gaussian Processes for Joint Species Distribution Modeling with Heterogeneous Data By projecteuclid.org Published On :: Thu, 19 Mar 2020 22:02 EDT Jarno Vanhatalo, Marcelo Hartmann, Lari Veneranta. Source: Bayesian Analysis, Volume 15, Number 2, 415--447.Abstract: Species distribution models (SDM) are a key tool in ecology, conservation and management of natural resources. Two key components of the state-of-the-art SDMs are the description for species distribution response along environmental covariates and the spatial random effect that captures deviations from the distribution patterns explained by environmental covariates. Joint species distribution models (JSDMs) additionally include interspecific correlations which have been shown to improve their descriptive and predictive performance compared to single species models. However, current JSDMs are restricted to hierarchical generalized linear modeling framework. Their limitation is that parametric models have trouble in explaining changes in abundance due, for example, highly non-linear physical tolerance limits which is particularly important when predicting species distribution in new areas or under scenarios of environmental change. On the other hand, semi-parametric response functions have been shown to improve the predictive performance of SDMs in these tasks in single species models. Here, we propose JSDMs where the responses to environmental covariates are modeled with additive multivariate Gaussian processes coded as linear models of coregionalization. These allow inference for wide range of functional forms and interspecific correlations between the responses. We propose also an efficient approach for inference with Laplace approximation and parameterization of the interspecific covariance matrices on the Euclidean space. We demonstrate the benefits of our model with two small scale examples and one real world case study. We use cross-validation to compare the proposed model to analogous semi-parametric single species models and parametric single and joint species models in interpolation and extrapolation tasks. The proposed model outperforms the alternative models in all cases. We also show that the proposed model can be seen as an extension of the current state-of-the-art JSDMs to semi-parametric models. Full Article
el Dynamic Quantile Linear Models: A Bayesian Approach By projecteuclid.org Published On :: Thu, 19 Mar 2020 22:02 EDT Kelly C. M. Gonçalves, Hélio S. Migon, Leonardo S. Bastos. Source: Bayesian Analysis, Volume 15, Number 2, 335--362.Abstract: The paper introduces a new class of models, named dynamic quantile linear models, which combines dynamic linear models with distribution-free quantile regression producing a robust statistical method. Bayesian estimation for the dynamic quantile linear model is performed using an efficient Markov chain Monte Carlo algorithm. The paper also proposes a fast sequential procedure suited for high-dimensional predictive modeling with massive data, where the generating process is changing over time. The proposed model is evaluated using synthetic and well-known time series data. The model is also applied to predict annual incidence of tuberculosis in the state of Rio de Janeiro and compared with global targets set by the World Health Organization. Full Article
el A Novel Algorithmic Approach to Bayesian Logic Regression (with Discussion) By projecteuclid.org Published On :: Tue, 17 Mar 2020 04:00 EDT Aliaksandr Hubin, Geir Storvik, Florian Frommlet. Source: Bayesian Analysis, Volume 15, Number 1, 263--333.Abstract: Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly due to computational challenges) remained less well known than other approaches to epistatic association mapping. Here we will adapt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL (quantitative trait locus) mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects. The method is implemented in an R package which is available on github. Full Article
el Learning Semiparametric Regression with Missing Covariates Using Gaussian Process Models By projecteuclid.org Published On :: Mon, 13 Jan 2020 04:00 EST Abhishek Bishoyi, Xiaojing Wang, Dipak K. Dey. Source: Bayesian Analysis, Volume 15, Number 1, 215--239.Abstract: Missing data often appear as a practical problem while applying classical models in the statistical analysis. In this paper, we consider a semiparametric regression model in the presence of missing covariates for nonparametric components under a Bayesian framework. As it is known that Gaussian processes are a popular tool in nonparametric regression because of their flexibility and the fact that much of the ensuing computation is parametric Gaussian computation. However, in the absence of covariates, the most frequently used covariance functions of a Gaussian process will not be well defined. We propose an imputation method to solve this issue and perform our analysis using Bayesian inference, where we specify the objective priors on the parameters of Gaussian process models. Several simulations are conducted to illustrate effectiveness of our proposed method and further, our method is exemplified via two real datasets, one through Langmuir equation, commonly used in pharmacokinetic models, and another through Auto-mpg data taken from the StatLib library. Full Article
el Adaptive Bayesian Nonparametric Regression Using a Kernel Mixture of Polynomials with Application to Partial Linear Models By projecteuclid.org Published On :: Mon, 13 Jan 2020 04:00 EST Fangzheng Xie, Yanxun Xu. Source: Bayesian Analysis, Volume 15, Number 1, 159--186.Abstract: We propose a kernel mixture of polynomials prior for Bayesian nonparametric regression. The regression function is modeled by local averages of polynomials with kernel mixture weights. We obtain the minimax-optimal contraction rate of the full posterior distribution up to a logarithmic factor by estimating metric entropies of certain function classes. Under the assumption that the degree of the polynomials is larger than the unknown smoothness level of the true function, the posterior contraction behavior can adapt to this smoothness level provided an upper bound is known. We also provide a frequentist sieve maximum likelihood estimator with a near-optimal convergence rate. We further investigate the application of the kernel mixture of polynomials to partial linear models and obtain both the near-optimal rate of contraction for the nonparametric component and the Bernstein-von Mises limit (i.e., asymptotic normality) of the parametric component. The proposed method is illustrated with numerical examples and shows superior performance in terms of computational efficiency, accuracy, and uncertainty quantification compared to the local polynomial regression, DiceKriging, and the robust Gaussian stochastic process. Full Article
el Bayesian Design of Experiments for Intractable Likelihood Models Using Coupled Auxiliary Models and Multivariate Emulation By projecteuclid.org Published On :: Mon, 13 Jan 2020 04:00 EST Antony Overstall, James McGree. Source: Bayesian Analysis, Volume 15, Number 1, 103--131.Abstract: A Bayesian design is given by maximising an expected utility over a design space. The utility is chosen to represent the aim of the experiment and its expectation is taken with respect to all unknowns: responses, parameters and/or models. Although straightforward in principle, there are several challenges to finding Bayesian designs in practice. Firstly, the utility and expected utility are rarely available in closed form and require approximation. Secondly, the design space can be of high-dimensionality. In the case of intractable likelihood models, these problems are compounded by the fact that the likelihood function, whose evaluation is required to approximate the expected utility, is not available in closed form. A strategy is proposed to find Bayesian designs for intractable likelihood models. It relies on the development of an automatic, auxiliary modelling approach, using multivariate Gaussian process emulators, to approximate the likelihood function. This is then combined with a copula-based approach to approximate the marginal likelihood (a quantity commonly required to evaluate many utility functions). These approximations are demonstrated on examples of stochastic process models involving experimental aims of both parameter estimation and model comparison. Full Article
el Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior By projecteuclid.org Published On :: Mon, 13 Jan 2020 04:00 EST Qingpo Cai, Jian Kang, Tianwei Yu. Source: Bayesian Analysis, Volume 15, Number 1, 79--102.Abstract: Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA). Full Article
el Scalable Bayesian Inference for the Inverse Temperature of a Hidden Potts Model By projecteuclid.org Published On :: Mon, 13 Jan 2020 04:00 EST Matthew Moores, Geoff Nicholls, Anthony Pettitt, Kerrie Mengersen. Source: Bayesian Analysis, Volume 15, Number 1, 1--27.Abstract: The inverse temperature parameter of the Potts model governs the strength of spatial cohesion and therefore has a major influence over the resulting model fit. A difficulty arises from the dependence of an intractable normalising constant on the value of this parameter and thus there is no closed-form solution for sampling from the posterior distribution directly. There is a variety of computational approaches for sampling from the posterior without evaluating the normalising constant, including the exchange algorithm and approximate Bayesian computation (ABC). A serious drawback of these algorithms is that they do not scale well for models with a large state space, such as images with a million or more pixels. We introduce a parametric surrogate model, which approximates the score function using an integral curve. Our surrogate model incorporates known properties of the likelihood, such as heteroskedasticity and critical temperature. We demonstrate this method using synthetic data as well as remotely-sensed imagery from the Landsat-8 satellite. We achieve up to a hundredfold improvement in the elapsed runtime, compared to the exchange algorithm or ABC. An open-source implementation of our algorithm is available in the R package bayesImageS . Full Article
el Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Andrea Cremaschi, Raffaele Argiento, Katherine Shoemaker, Christine Peterson, Marina Vannucci. Source: Bayesian Analysis, Volume 14, Number 4, 1271--1301.Abstract: Gaussian graphical models are useful tools for exploring network structures in multivariate normal data. In this paper we are interested in situations where data show departures from Gaussianity, therefore requiring alternative modeling distributions. The multivariate $t$ -distribution, obtained by dividing each component of the data vector by a gamma random variable, is a straightforward generalization to accommodate deviations from normality such as heavy tails. Since different groups of variables may be contaminated to a different extent, Finegold and Drton (2014) introduced the Dirichlet $t$ -distribution, where the divisors are clustered using a Dirichlet process. In this work, we consider a more general class of nonparametric distributions as the prior on the divisor terms, namely the class of normalized completely random measures (NormCRMs). To improve the effectiveness of the clustering, we propose modeling the dependence among the divisors through a nonparametric hierarchical structure, which allows for the sharing of parameters across the samples in the data set. This desirable feature enables us to cluster together different components of multivariate data in a parsimonious way. We demonstrate through simulations that this approach provides accurate graphical model inference, and apply it to a case study examining the dependence structure in radiomics data derived from The Cancer Imaging Atlas. Full Article
el Spatial Disease Mapping Using Directed Acyclic Graph Auto-Regressive (DAGAR) Models By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Abhirup Datta, Sudipto Banerjee, James S. Hodges, Leiwen Gao. Source: Bayesian Analysis, Volume 14, Number 4, 1221--1244.Abstract: Hierarchical models for regionally aggregated disease incidence data commonly involve region specific latent random effects that are modeled jointly as having a multivariate Gaussian distribution. The covariance or precision matrix incorporates the spatial dependence between the regions. Common choices for the precision matrix include the widely used ICAR model, which is singular, and its nonsingular extension which lacks interpretability. We propose a new parametric model for the precision matrix based on a directed acyclic graph (DAG) representation of the spatial dependence. Our model guarantees positive definiteness and, hence, in addition to being a valid prior for regional spatially correlated random effects, can also directly model the outcome from dependent data like images and networks. Theoretical results establish a link between the parameters in our model and the variance and covariances of the random effects. Simulation studies demonstrate that the improved interpretability of our model reaps benefits in terms of accurately recovering the latent spatial random effects as well as for inference on the spatial covariance parameters. Under modest spatial correlation, our model far outperforms the CAR models, while the performances are similar when the spatial correlation is strong. We also assess sensitivity to the choice of the ordering in the DAG construction using theoretical and empirical results which testify to the robustness of our model. We also present a large-scale public health application demonstrating the competitive performance of the model. Full Article
el Estimating the Use of Public Lands: Integrated Modeling of Open Populations with Convolution Likelihood Ecological Abundance Regression By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Lutz F. Gruber, Erica F. Stuber, Lyndsie S. Wszola, Joseph J. Fontaine. Source: Bayesian Analysis, Volume 14, Number 4, 1173--1199.Abstract: We present an integrated open population model where the population dynamics are defined by a differential equation, and the related statistical model utilizes a Poisson binomial convolution likelihood. Key advantages of the proposed approach over existing open population models include the flexibility to predict related, but unobserved quantities such as total immigration or emigration over a specified time period, and more computationally efficient posterior simulation by elimination of the need to explicitly simulate latent immigration and emigration. The viability of the proposed method is shown in an in-depth analysis of outdoor recreation participation on public lands, where the surveyed populations changed rapidly and demographic population closure cannot be assumed even within a single day. Full Article
el Variance Prior Forms for High-Dimensional Bayesian Variable Selection By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Gemma E. Moran, Veronika Ročková, Edward I. George. Source: Bayesian Analysis, Volume 14, Number 4, 1091--1119.Abstract: Consider the problem of high dimensional variable selection for the Gaussian linear model when the unknown error variance is also of interest. In this paper, we show that the use of conjugate shrinkage priors for Bayesian variable selection can have detrimental consequences for such variance estimation. Such priors are often motivated by the invariance argument of Jeffreys (1961). Revisiting this work, however, we highlight a caveat that Jeffreys himself noticed; namely that biased estimators can result from inducing dependence between parameters a priori . In a similar way, we show that conjugate priors for linear regression, which induce prior dependence, can lead to such underestimation in the Bayesian high-dimensional regression setting. Following Jeffreys, we recommend as a remedy to treat regression coefficients and the error variance as independent a priori . Using such an independence prior framework, we extend the Spike-and-Slab Lasso of Ročková and George (2018) to the unknown variance case. This extended procedure outperforms both the fixed variance approach and alternative penalized likelihood methods on simulated data. On the protein activity dataset of Clyde and Parmigiani (1998), the Spike-and-Slab Lasso with unknown variance achieves lower cross-validation error than alternative penalized likelihood methods, demonstrating the gains in predictive accuracy afforded by simultaneous error variance estimation. The unknown variance implementation of the Spike-and-Slab Lasso is provided in the publicly available R package SSLASSO (Ročková and Moran, 2017). Full Article
el Beyond Whittle: Nonparametric Correction of a Parametric Likelihood with a Focus on Bayesian Time Series Analysis By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Claudia Kirch, Matthew C. Edwards, Alexander Meier, Renate Meyer. Source: Bayesian Analysis, Volume 14, Number 4, 1037--1073.Abstract: Nonparametric Bayesian inference has seen a rapid growth over the last decade but only few nonparametric Bayesian approaches to time series analysis have been developed. Most existing approaches use Whittle’s likelihood for Bayesian modelling of the spectral density as the main nonparametric characteristic of stationary time series. It is known that the loss of efficiency using Whittle’s likelihood can be substantial. On the other hand, parametric methods are more powerful than nonparametric methods if the observed time series is close to the considered model class but fail if the model is misspecified. Therefore, we suggest a nonparametric correction of a parametric likelihood that takes advantage of the efficiency of parametric models while mitigating sensitivities through a nonparametric amendment. We use a nonparametric Bernstein polynomial prior on the spectral density with weights induced by a Dirichlet process and prove posterior consistency for Gaussian stationary time series. Bayesian posterior computations are implemented via an MH-within-Gibbs sampler and the performance of the nonparametrically corrected likelihood for Gaussian time series is illustrated in a simulation study and in three astronomy applications, including estimating the spectral density of gravitational wave data from the Advanced Laser Interferometer Gravitational-wave Observatory (LIGO). Full Article
el Bayes Factors for Partially Observed Stochastic Epidemic Models By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Muteb Alharthi, Theodore Kypraios, Philip D. O’Neill. Source: Bayesian Analysis, Volume 14, Number 3, 927--956.Abstract: We consider the problem of model choice for stochastic epidemic models given partial observation of a disease outbreak through time. Our main focus is on the use of Bayes factors. Although Bayes factors have appeared in the epidemic modelling literature before, they can be hard to compute and little attention has been given to fundamental questions concerning their utility. In this paper we derive analytic expressions for Bayes factors given complete observation through time, which suggest practical guidelines for model choice problems. We adapt the power posterior method for computing Bayes factors so as to account for missing data and apply this approach to partially observed epidemics. For comparison, we also explore the use of a deviance information criterion for missing data scenarios. The methods are illustrated via examples involving both simulated and real data. Full Article
el Jointly Robust Prior for Gaussian Stochastic Process in Emulation, Calibration and Variable Selection By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Mengyang Gu. Source: Bayesian Analysis, Volume 14, Number 3, 877--905.Abstract: Gaussian stochastic process (GaSP) has been widely used in two fundamental problems in uncertainty quantification, namely the emulation and calibration of mathematical models. Some objective priors, such as the reference prior, are studied in the context of emulating (approximating) computationally expensive mathematical models. In this work, we introduce a new class of priors, called the jointly robust prior, for both the emulation and calibration. This prior is designed to maintain various advantages from the reference prior. In emulation, the jointly robust prior has an appropriate tail decay rate as the reference prior, and is computationally simpler than the reference prior in parameter estimation. Moreover, the marginal posterior mode estimation with the jointly robust prior can separate the influential and inert inputs in mathematical models, while the reference prior does not have this property. We establish the posterior propriety for a large class of priors in calibration, including the reference prior and jointly robust prior in general scenarios, but the jointly robust prior is preferred because the calibrated mathematical model typically predicts the reality well. The jointly robust prior is used as the default prior in two new R packages, called “RobustGaSP” and “RobustCalibration”, available on CRAN for emulation and calibration, respectively. Full Article
el Probability Based Independence Sampler for Bayesian Quantitative Learning in Graphical Log-Linear Marginal Models By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Ioannis Ntzoufras, Claudia Tarantola, Monia Lupparelli. Source: Bayesian Analysis, Volume 14, Number 3, 797--823.Abstract: We introduce a novel Bayesian approach for quantitative learning for graphical log-linear marginal models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. The likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and Markov Chain Monte Carlo (MCMC) methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for such models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. We exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset. Full Article
el Semiparametric Multivariate and Multiple Change-Point Modeling By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Stefano Peluso, Siddhartha Chib, Antonietta Mira. Source: Bayesian Analysis, Volume 14, Number 3, 727--751.Abstract: We develop a general Bayesian semiparametric change-point model in which separate groups of structural parameters (for example, location and dispersion parameters) can each follow a separate multiple change-point process, driven by time-dependent transition matrices among the latent regimes. The distribution of the observations within regimes is unknown and given by a Dirichlet process mixture prior. The properties of the proposed model are studied theoretically through the analysis of inter-arrival times and of the number of change-points in a given time interval. The prior-posterior analysis by Markov chain Monte Carlo techniques is developed on a forward-backward algorithm for sampling the various regime indicators. Analysis with simulated data under various scenarios and an application to short-term interest rates are used to show the generality and usefulness of the proposed model. Full Article
el Model Criticism in Latent Space By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Sohan Seth, Iain Murray, Christopher K. I. Williams. Source: Bayesian Analysis, Volume 14, Number 3, 703--725.Abstract: Model criticism is usually carried out by assessing if replicated data generated under the fitted model looks similar to the observed data, see e.g. Gelman, Carlin, Stern, and Rubin (2004, p. 165). This paper presents a method for latent variable models by pulling back the data into the space of latent variables, and carrying out model criticism in that space. Making use of a model's structure enables a more direct assessment of the assumptions made in the prior and likelihood. We demonstrate the method with examples of model criticism in latent space applied to factor analysis, linear dynamical systems and Gaussian processes. Full Article
el Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Yushu Shi, Michael Martens, Anjishnu Banerjee, Purushottam Laud. Source: Bayesian Analysis, Volume 14, Number 3, 677--702.Abstract: Dirichlet process mixture (DPM) models provide flexible modeling for distributions of data as an infinite mixture of distributions from a chosen collection. Specifying priors for these models in individual data contexts can be challenging. In this paper, we introduce a scheme which requires the investigator to specify only simple scaling information. This is used to transform the data to a fixed scale on which a low information prior is constructed. Samples from the posterior with the rescaled data are transformed back for inference on the original scale. The low information prior is selected to provide a wide variety of components for the DPM to generate flexible distributions for the data on the fixed scale. The method can be applied to all DPM models with kernel functions closed under a suitable scaling transformation. Construction of the low information prior, however, is kernel dependent. Using DPM-of-Gaussians and DPM-of-Weibulls models as examples, we show that the method provides accurate estimates of a diverse collection of distributions that includes skewed, multimodal, and highly dispersed members. With the recommended priors, repeated data simulations show performance comparable to that of standard empirical estimates. Finally, we show weak convergence of posteriors with the proposed priors for both kernels considered. Full Article
el Efficient Acquisition Rules for Model-Based Approximate Bayesian Computation By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Marko Järvenpää, Michael U. Gutmann, Arijus Pleska, Aki Vehtari, Pekka Marttinen. Source: Bayesian Analysis, Volume 14, Number 2, 595--622.Abstract: Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next but common BO strategies are not designed for the goal of estimating the posterior distribution. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to a lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies. Full Article
el Fast Model-Fitting of Bayesian Variable Selection Regression Using the Iterative Complex Factorization Algorithm By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Quan Zhou, Yongtao Guan. Source: Bayesian Analysis, Volume 14, Number 2, 573--594.Abstract: Bayesian variable selection regression (BVSR) is able to jointly analyze genome-wide genetic datasets, but the slow computation via Markov chain Monte Carlo (MCMC) hampered its wide-spread usage. Here we present a novel iterative method to solve a special class of linear systems, which can increase the speed of the BVSR model-fitting tenfold. The iterative method hinges on the complex factorization of the sum of two matrices and the solution path resides in the complex domain (instead of the real domain). Compared to the Gauss-Seidel method, the complex factorization converges almost instantaneously and its error is several magnitude smaller than that of the Gauss-Seidel method. More importantly, the error is always within the pre-specified precision while the Gauss-Seidel method is not. For large problems with thousands of covariates, the complex factorization is 10–100 times faster than either the Gauss-Seidel method or the direct method via the Cholesky decomposition. In BVSR, one needs to repetitively solve large penalized regression systems whose design matrices only change slightly between adjacent MCMC steps. This slight change in design matrix enables the adaptation of the iterative complex factorization method. The computational innovation will facilitate the wide-spread use of BVSR in reanalyzing genome-wide association datasets. Full Article
el A Bayesian Nonparametric Spiked Process Prior for Dynamic Model Selection By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Alberto Cassese, Weixuan Zhu, Michele Guindani, Marina Vannucci. Source: Bayesian Analysis, Volume 14, Number 2, 553--572.Abstract: In many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or “normal” behavior. In this manuscript, we consider the monitoring of pneumonia and influenza (P&I) mortality, to detect influenza outbreaks in the continental United States, and propose a Bayesian nonparametric model selection approach to take into account the spatio-temporal dependence of outbreaks. More specifically, we introduce a zero-inflated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random field prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching based models, and with spike-and-slab Bayesian nonparametric priors that do not take into account spatio-temporal dependence. Full Article
el Bayes Factor Testing of Multiple Intraclass Correlations By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Joris Mulder, Jean-Paul Fox. Source: Bayesian Analysis, Volume 14, Number 2, 521--552.Abstract: The intraclass correlation plays a central role in modeling hierarchically structured data, such as educational data, panel data, or group-randomized trial data. It represents relevant information concerning the between-group and within-group variation. Methods for Bayesian hypothesis tests concerning the intraclass correlation are proposed to improve decision making in hierarchical data analysis and to assess the grouping effect across different group categories. Estimation and testing methods for the intraclass correlation coefficient are proposed under a marginal modeling framework where the random effects are integrated out. A class of stretched beta priors is proposed on the intraclass correlations, which is equivalent to shifted $F$ priors for the between groups variances. Through a parameter expansion it is shown that this prior is conditionally conjugate under the marginal model yielding efficient posterior computation. A special improper case results in accurate coverage rates of the credible intervals even for minimal sample size and when the true intraclass correlation equals zero. Bayes factor tests are proposed for testing multiple precise and order hypotheses on intraclass correlations. These tests can be used when prior information about the intraclass correlations is available or absent. For the noninformative case, a generalized fractional Bayes approach is developed. The method enables testing the presence and strength of grouped data structures without introducing random effects. The methodology is applied to a large-scale survey study on international mathematics achievement at fourth grade to test the heterogeneity in the clustering of students in schools across countries and assessment cycles. Full Article
el Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Łukasz Rajkowski. Source: Bayesian Analysis, Volume 14, Number 2, 477--494.Abstract: Mixture models are a natural choice in many applications, but it can be difficult to place an a priori upper bound on the number of components. To circumvent this, investigators are turning increasingly to Dirichlet process mixture models (DPMMs). It is therefore important to develop an understanding of the strengths and weaknesses of this approach. This work considers the MAP (maximum a posteriori) clustering for the Gaussian DPMM (where the cluster means have Gaussian distribution and, for each cluster, the observations within the cluster have Gaussian distribution). Some desirable properties of the MAP partition are proved: ‘almost disjointness’ of the convex hulls of clusters (they may have at most one point in common) and (with natural assumptions) the comparability of sizes of those clusters that intersect any fixed ball with the number of observations (as the latter goes to infinity). Consequently, the number of such clusters remains bounded. Furthermore, if the data arises from independent identically distributed sampling from a given distribution with bounded support then the asymptotic MAP partition of the observation space maximises a function which has a straightforward expression, which depends only on the within-group covariance parameter. As the operator norm of this covariance parameter decreases, the number of clusters in the MAP partition becomes arbitrarily large, which may lead to the overestimation of the number of mixture components. Full Article
el Efficient Bayesian Regularization for Graphical Model Selection By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Suprateek Kundu, Bani K. Mallick, Veera Baladandayuthapani. Source: Bayesian Analysis, Volume 14, Number 2, 449--476.Abstract: There has been an intense development in the Bayesian graphical model literature over the past decade; however, most of the existing methods are restricted to moderate dimensions. We propose a novel graphical model selection approach for large dimensional settings where the dimension increases with the sample size, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under a novel class of mixtures of inverse–Wishart priors, which induce shrinkage on the precision matrix under an equivalence with Cholesky-based regularization, while enabling conjugate updates. Subsequently, a post-fitting model selection step uses penalized joint credible regions to perform model selection. This allows our methods to be computationally feasible for large dimensional settings using a combination of straightforward Gibbs samplers and efficient post-fitting inferences. Theoretical guarantees in terms of selection consistency are also established. Simulations show that the proposed approach compares favorably with competing methods, both in terms of accuracy metrics and computation times. We apply this approach to a cancer genomics data example. Full Article
el Variational Message Passing for Elaborate Response Regression Models By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT M. W. McLean, M. P. Wand. Source: Bayesian Analysis, Volume 14, Number 2, 371--398.Abstract: We build on recent work concerning message passing approaches to approximate fitting and inference for arbitrarily large regression models. The focus is on regression models where the response variable is modeled to have an elaborate distribution, which is loosely defined to mean a distribution that is more complicated than common distributions such as those in the Bernoulli, Poisson and Normal families. Examples of elaborate response families considered here are the Negative Binomial and $t$ families. Variational message passing is more challenging due to some of the conjugate exponential families being non-standard and numerical integration being needed. Nevertheless, a factor graph fragment approach means the requisite calculations only need to be done once for a particular elaborate response distribution family. Computer code can be compartmentalized, including that involving numerical integration. A major finding of this work is that the modularity of variational message passing extends to elaborate response regression models. Full Article
el Modeling Population Structure Under Hierarchical Dirichlet Processes By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Lloyd T. Elliott, Maria De Iorio, Stefano Favaro, Kaustubh Adhikari, Yee Whye Teh. Source: Bayesian Analysis, Volume 14, Number 2, 313--339.Abstract: We propose a Bayesian nonparametric model to infer population admixture, extending the hierarchical Dirichlet process to allow for correlation between loci due to linkage disequilibrium. Given multilocus genotype data from a sample of individuals, the proposed model allows inferring and classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process, and can be applied to most of the commonly used genetic markers. We present a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference from the model and we discuss some methods to summarize the MCMC output for the analysis of population admixture. Finally, we demonstrate the performance of the proposed model in a real application, using genetic data from the ectodysplasin-A receptor (EDAR) gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans. We also conduct a simulated experiment and show that our algorithm outperforms parametric methods. Full Article
el Separable covariance arrays via the Tucker product, with applications to multivariate relational data By projecteuclid.org Published On :: Wed, 13 Jun 2012 14:27 EDT Peter D. HoffSource: Bayesian Anal., Volume 6, Number 2, 179--196.Abstract: Modern datasets are often in the form of matrices or arrays, potentially having correlations along each set of data indices. For example, data involving repeated measurements of several variables over time may exhibit temporal correlation as well as correlation among the variables. A possible model for matrix-valued data is the class of matrix normal distributions, which is parametrized by two covariance matrices, one for each index set of the data. In this article we discuss an extension of the matrix normal model to accommodate multidimensional data arrays, or tensors. We show how a particular array-matrix product can be used to generate the class of array normal distributions having separable covariance structure. We derive some properties of these covariance structures and the corresponding array normal distributions, and show how the array-matrix product can be used to define a semi-conjugate prior distribution and calculate the corresponding posterior distribution. We illustrate the methodology in an analysis of multivariate longitudinal network data which take the form of a four-way array. Full Article
el Data Denoising and Post-Denoising Corrections in Single Cell RNA Sequencing By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Divyansh Agarwal, Jingshu Wang, Nancy R. Zhang. Source: Statistical Science, Volume 35, Number 1, 112--128.Abstract: Single cell sequencing technologies are transforming biomedical research. However, due to the inherent nature of the data, single cell RNA sequencing analysis poses new computational and statistical challenges. We begin with a survey of a selection of topics in this field, with a gentle introduction to the biology and a more detailed exploration of the technical noise. We consider in detail the problem of single cell data denoising, sometimes referred to as “imputation” in the relevant literature. We discuss why this is not a typical statistical imputation problem, and review current approaches to this problem. We then explore why the use of denoised values in downstream analyses invites novel statistical insights, and how denoising uncertainty should be accounted for to yield valid statistical inference. The utilization of denoised or imputed matrices in statistical inference is not unique to single cell genomics, and arises in many other fields. We describe the challenges in this type of analysis, discuss some preliminary solutions, and highlight unresolved issues. Full Article
el A Tale of Two Parasites: Statistical Modelling to Support Disease Control Programmes in Africa By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Peter J. Diggle, Emanuele Giorgi, Julienne Atsame, Sylvie Ntsame Ella, Kisito Ogoussan, Katherine Gass. Source: Statistical Science, Volume 35, Number 1, 42--50.Abstract: Vector-borne diseases have long presented major challenges to the health of rural communities in the wet tropical regions of the world, but especially in sub-Saharan Africa. In this paper, we describe the contribution that statistical modelling has made to the global elimination programme for one vector-borne disease, onchocerciasis. We explain why information on the spatial distribution of a second vector-borne disease, Loa loa, is needed before communities at high risk of onchocerciasis can be treated safely with mass distribution of ivermectin, an antifiarial medication. We show how a model-based geostatistical analysis of Loa loa prevalence survey data can be used to map the predictive probability that each location in the region of interest meets a WHO policy guideline for safe mass distribution of ivermectin and describe two applications: one is to data from Cameroon that assesses prevalence using traditional blood-smear microscopy; the other is to Africa-wide data that uses a low-cost questionnaire-based method. We describe how a recent technological development in image-based microscopy has resulted in a change of emphasis from prevalence alone to the bivariate spatial distribution of prevalence and the intensity of infection among infected individuals. We discuss how statistical modelling of the kind described here can contribute to health policy guidelines and decision-making in two ways. One is to ensure that, in a resource-limited setting, prevalence surveys are designed, and the resulting data analysed, as efficiently as possible. The other is to provide an honest quantification of the uncertainty attached to any binary decision by reporting predictive probabilities that a policy-defined condition for action is or is not met. Full Article
el Risk Models for Breast Cancer and Their Validation By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Adam R. Brentnall, Jack Cuzick. Source: Statistical Science, Volume 35, Number 1, 14--30.Abstract: Strategies to prevent cancer and diagnose it early when it is most treatable are needed to reduce the public health burden from rising disease incidence. Risk assessment is playing an increasingly important role in targeting individuals in need of such interventions. For breast cancer many individual risk factors have been well understood for a long time, but the development of a fully comprehensive risk model has not been straightforward, in part because there have been limited data where joint effects of an extensive set of risk factors may be estimated with precision. In this article we first review the approach taken to develop the IBIS (Tyrer–Cuzick) model, and describe recent updates. We then review and develop methods to assess calibration of models such as this one, where the risk of disease allowing for competing mortality over a long follow-up time or lifetime is estimated. The breast cancer risk model model and calibration assessment methods are demonstrated using a cohort of 132,139 women attending mammography screening in the State of Washington, USA. Full Article
el Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Zhixiang Lin, Mahdi Zamanighomi, Timothy Daley, Shining Ma, Wing Hung Wong. Source: Statistical Science, Volume 35, Number 1, 2--13.Abstract: Unsupervised methods, including clustering methods, are essential to the analysis of single-cell genomic data. Model-based clustering methods are under-explored in the area of single-cell genomics, and have the advantage of quantifying the uncertainty of the clustering result. Here we develop a model-based approach for the integrative analysis of single-cell chromatin accessibility and gene expression data. We show that combining these two types of data, we can achieve a better separation of the underlying cell types. An efficient Markov chain Monte Carlo algorithm is also developed. Full Article
el Gaussianization Machines for Non-Gaussian Function Estimation Models By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST T. Tony Cai. Source: Statistical Science, Volume 34, Number 4, 635--656.Abstract: A wide range of nonparametric function estimation models have been studied individually in the literature. Among them the homoscedastic nonparametric Gaussian regression is arguably the best known and understood. Inspired by the asymptotic equivalence theory, Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046) and Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433) developed a unified approach to turn a collection of non-Gaussian function estimation models into a standard Gaussian regression and any good Gaussian nonparametric regression method can then be used. These Gaussianization Machines have two key components, binning and transformation. When combined with BlockJS, a wavelet thresholding procedure for Gaussian regression, the procedures are computationally efficient with strong theoretical guarantees. Technical analysis given in Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046) and Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433) shows that the estimators attain the optimal rate of convergence adaptively over a large set of Besov spaces and across a collection of non-Gaussian function estimation models, including robust nonparametric regression, density estimation, and nonparametric regression in exponential families. The estimators are also spatially adaptive. The Gaussianization Machines significantly extend the flexibility and scope of the theories and methodologies originally developed for the conventional nonparametric Gaussian regression. This article aims to provide a concise account of the Gaussianization Machines developed in Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046), Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433). Full Article
el Models as Approximations—Rejoinder By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Andreas Buja, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Eric Tchetgen Tchetgen, Linda Zhao. Source: Statistical Science, Volume 34, Number 4, 606--620.Abstract: We respond to the discussants of our articles emphasizing the importance of inference under misspecification in the context of the reproducibility/replicability crisis. Along the way, we discuss the roles of diagnostics and model building in regression as well as connections between our well-specification framework and semiparametric theory. Full Article
el Discussion: Models as Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Dalia Ghanem, Todd A. Kuffner. Source: Statistical Science, Volume 34, Number 4, 604--605. Full Article
el Comment: Models as (Deliberate) Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST David Whitney, Ali Shojaie, Marco Carone. Source: Statistical Science, Volume 34, Number 4, 591--598. Full Article
el Comment: Models Are Approximations! By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Anthony C. Davison, Erwan Koch, Jonathan Koh. Source: Statistical Science, Volume 34, Number 4, 584--590.Abstract: This discussion focuses on areas of disagreement with the papers, particularly the target of inference and the case for using the robust ‘sandwich’ variance estimator in the presence of moderate mis-specification. We also suggest that existing procedures may be appreciably more powerful for detecting mis-specification than the authors’ RAV statistic, and comment on the use of the pairs bootstrap in balanced situations. Full Article
el Comment: “Models as Approximations I: Consequences Illustrated with Linear Regression” by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, L. Zhan and K. Zhang By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Roderick J. Little. Source: Statistical Science, Volume 34, Number 4, 580--583. Full Article
el Discussion of Models as Approximations I & II By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Dag Tjøstheim. Source: Statistical Science, Volume 34, Number 4, 575--579. Full Article
el Comment: Models as Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Nikki L. B. Freeman, Xiaotong Jiang, Owen E. Leete, Daniel J. Luckett, Teeranan Pokaprakarn, Michael R. Kosorok. Source: Statistical Science, Volume 34, Number 4, 572--574. Full Article
el Comment on Models as Approximations, Parts I and II, by Buja et al. By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Jerald F. Lawless. Source: Statistical Science, Volume 34, Number 4, 569--571.Abstract: I comment on the papers Models as Approximations I and II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zhao and K. Zhang. Full Article
el Discussion of Models as Approximations I & II By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Sara van de Geer. Source: Statistical Science, Volume 34, Number 4, 566--568.Abstract: We discuss the papers “Models as Approximations” I & II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zao and K. Zhang (Part I) and A. Buja, L. Brown, A. K. Kuchibhota, R. Berk, E. George and L. Zhao (Part II). We present a summary with some details for the generalized linear model. Full Article