pr

Bayesian Functional Forecasting with Locally-Autoregressive Dependent Processes

Guillaume Kon Kam King, Antonio Canale, Matteo Ruggiero.

Source: Bayesian Analysis, Volume 14, Number 4, 1121--1141.

Abstract:
Motivated by the problem of forecasting demand and offer curves, we introduce a class of nonparametric dynamic models with locally-autoregressive behaviour, and provide a full inferential strategy for forecasting time series of piecewise-constant non-decreasing functions over arbitrary time horizons. The model is induced by a non Markovian system of interacting particles whose evolution is governed by a resampling step and a drift mechanism. The former is based on a global interaction and accounts for the volatility of the functional time series, while the latter is determined by a neighbourhood-based interaction with the past curves and accounts for local trend behaviours, separating these from pure noise. We discuss the implementation of the model for functional forecasting by combining a population Monte Carlo and a semi-automatic learning approach to approximate Bayesian computation which require limited tuning. We validate the inference method with a simulation study, and carry out predictive inference on a real dataset on the Italian natural gas market.




pr

Variance Prior Forms for High-Dimensional Bayesian Variable Selection

Gemma E. Moran, Veronika Ročková, Edward I. George.

Source: Bayesian Analysis, Volume 14, Number 4, 1091--1119.

Abstract:
Consider the problem of high dimensional variable selection for the Gaussian linear model when the unknown error variance is also of interest. In this paper, we show that the use of conjugate shrinkage priors for Bayesian variable selection can have detrimental consequences for such variance estimation. Such priors are often motivated by the invariance argument of Jeffreys (1961). Revisiting this work, however, we highlight a caveat that Jeffreys himself noticed; namely that biased estimators can result from inducing dependence between parameters a priori . In a similar way, we show that conjugate priors for linear regression, which induce prior dependence, can lead to such underestimation in the Bayesian high-dimensional regression setting. Following Jeffreys, we recommend as a remedy to treat regression coefficients and the error variance as independent a priori . Using such an independence prior framework, we extend the Spike-and-Slab Lasso of Ročková and George (2018) to the unknown variance case. This extended procedure outperforms both the fixed variance approach and alternative penalized likelihood methods on simulated data. On the protein activity dataset of Clyde and Parmigiani (1998), the Spike-and-Slab Lasso with unknown variance achieves lower cross-validation error than alternative penalized likelihood methods, demonstrating the gains in predictive accuracy afforded by simultaneous error variance estimation. The unknown variance implementation of the Spike-and-Slab Lasso is provided in the publicly available R package SSLASSO (Ročková and Moran, 2017).




pr

Post-Processing Posteriors Over Precision Matrices to Produce Sparse Graph Estimates

Amir Bashir, Carlos M. Carvalho, P. Richard Hahn, M. Beatrix Jones.

Source: Bayesian Analysis, Volume 14, Number 4, 1075--1090.

Abstract:
A variety of computationally efficient Bayesian models for the covariance matrix of a multivariate Gaussian distribution are available. However, all produce a relatively dense estimate of the precision matrix, and are therefore unsatisfactory when one wishes to use the precision matrix to consider the conditional independence structure of the data. This paper considers the posterior predictive distribution of model fit for these covariance models. We then undertake post-processing of the Bayes point estimate for the precision matrix to produce a sparse model whose expected fit lies within the upper 95% of the posterior predictive distribution of fit. The impact of the method for selecting the zero elements of the precision matrix is evaluated. Good results were obtained using models that encouraged a sparse posterior (G-Wishart, Bayesian adaptive graphical lasso) and selection using credible intervals. We also find that this approach is easily extended to the problem of finding a sparse set of elements that differ across a set of precision matrices, a natural summary when a common set of variables is observed under multiple conditions. We illustrate our findings with moderate dimensional data examples from finance and metabolomics.




pr

Extrinsic Gaussian Processes for Regression and Classification on Manifolds

Lizhen Lin, Niu Mu, Pokman Cheung, David Dunson.

Source: Bayesian Analysis, Volume 14, Number 3, 907--926.

Abstract:
Gaussian processes (GPs) are very widely used for modeling of unknown functions or surfaces in applications ranging from regression to classification to spatial processes. Although there is an increasingly vast literature on applications, methods, theory and algorithms related to GPs, the overwhelming majority of this literature focuses on the case in which the input domain corresponds to a Euclidean space. However, particularly in recent years with the increasing collection of complex data, it is commonly the case that the input domain does not have such a simple form. For example, it is common for the inputs to be restricted to a non-Euclidean manifold, a case which forms the motivation for this article. In particular, we propose a general extrinsic framework for GP modeling on manifolds, which relies on embedding of the manifold into a Euclidean space and then constructing extrinsic kernels for GPs on their images. These extrinsic Gaussian processes (eGPs) are used as prior distributions for unknown functions in Bayesian inferences. Our approach is simple and general, and we show that the eGPs inherit fine theoretical properties from GP models in Euclidean spaces. We consider applications of our models to regression and classification problems with predictors lying in a large class of manifolds, including spheres, planar shape spaces, a space of positive definite matrices, and Grassmannians. Our models can be readily used by practitioners in biological sciences for various regression and classification problems, such as disease diagnosis or detection. Our work is also likely to have impact in spatial statistics when spatial locations are on the sphere or other geometric spaces.




pr

Jointly Robust Prior for Gaussian Stochastic Process in Emulation, Calibration and Variable Selection

Mengyang Gu.

Source: Bayesian Analysis, Volume 14, Number 3, 877--905.

Abstract:
Gaussian stochastic process (GaSP) has been widely used in two fundamental problems in uncertainty quantification, namely the emulation and calibration of mathematical models. Some objective priors, such as the reference prior, are studied in the context of emulating (approximating) computationally expensive mathematical models. In this work, we introduce a new class of priors, called the jointly robust prior, for both the emulation and calibration. This prior is designed to maintain various advantages from the reference prior. In emulation, the jointly robust prior has an appropriate tail decay rate as the reference prior, and is computationally simpler than the reference prior in parameter estimation. Moreover, the marginal posterior mode estimation with the jointly robust prior can separate the influential and inert inputs in mathematical models, while the reference prior does not have this property. We establish the posterior propriety for a large class of priors in calibration, including the reference prior and jointly robust prior in general scenarios, but the jointly robust prior is preferred because the calibrated mathematical model typically predicts the reality well. The jointly robust prior is used as the default prior in two new R packages, called “RobustGaSP” and “RobustCalibration”, available on CRAN for emulation and calibration, respectively.




pr

High-Dimensional Confounding Adjustment Using Continuous Spike and Slab Priors

Joseph Antonelli, Giovanni Parmigiani, Francesca Dominici.

Source: Bayesian Analysis, Volume 14, Number 3, 825--848.

Abstract:
In observational studies, estimation of a causal effect of a treatment on an outcome relies on proper adjustment for confounding. If the number of the potential confounders ( $p$ ) is larger than the number of observations ( $n$ ), then direct control for all potential confounders is infeasible. Existing approaches for dimension reduction and penalization are generally aimed at predicting the outcome, and are less suited for estimation of causal effects. Under standard penalization approaches (e.g. Lasso), if a variable $X_{j}$ is strongly associated with the treatment $T$ but weakly with the outcome $Y$ , the coefficient $eta_{j}$ will be shrunk towards zero thus leading to confounding bias. Under the assumption of a linear model for the outcome and sparsity, we propose continuous spike and slab priors on the regression coefficients $eta_{j}$ corresponding to the potential confounders $X_{j}$ . Specifically, we introduce a prior distribution that does not heavily shrink to zero the coefficients ( $eta_{j}$ s) of the $X_{j}$ s that are strongly associated with $T$ but weakly associated with $Y$ . We compare our proposed approach to several state of the art methods proposed in the literature. Our proposed approach has the following features: 1) it reduces confounding bias in high dimensional settings; 2) it shrinks towards zero coefficients of instrumental variables; and 3) it achieves good coverages even in small sample sizes. We apply our approach to the National Health and Nutrition Examination Survey (NHANES) data to estimate the causal effects of persistent pesticide exposure on triglyceride levels.




pr

Probability Based Independence Sampler for Bayesian Quantitative Learning in Graphical Log-Linear Marginal Models

Ioannis Ntzoufras, Claudia Tarantola, Monia Lupparelli.

Source: Bayesian Analysis, Volume 14, Number 3, 797--823.

Abstract:
We introduce a novel Bayesian approach for quantitative learning for graphical log-linear marginal models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. The likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and Markov Chain Monte Carlo (MCMC) methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for such models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. We exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset.




pr

Sequential Monte Carlo Samplers with Independent Markov Chain Monte Carlo Proposals

L. F. South, A. N. Pettitt, C. C. Drovandi.

Source: Bayesian Analysis, Volume 14, Number 3, 773--796.

Abstract:
Sequential Monte Carlo (SMC) methods for sampling from the posterior of static Bayesian models are flexible, parallelisable and capable of handling complex targets. However, it is common practice to adopt a Markov chain Monte Carlo (MCMC) kernel with a multivariate normal random walk (RW) proposal in the move step, which can be both inefficient and detrimental for exploring challenging posterior distributions. We develop new SMC methods with independent proposals which allow recycling of all candidates generated in the SMC process and are embarrassingly parallelisable. A novel evidence estimator that is easily computed from the output of our independent SMC is proposed. Our independent proposals are constructed via flexible copula-type models calibrated with the population of SMC particles. We demonstrate through several examples that more precise estimates of posterior expectations and the marginal likelihood can be obtained using fewer likelihood evaluations than the more standard RW approach.




pr

Stochastic Approximations to the Pitman–Yor Process

Julyan Arbel, Pierpaolo De Blasi, Igor Prünster.

Source: Bayesian Analysis, Volume 14, Number 3, 753--771.

Abstract:
In this paper we consider approximations to the popular Pitman–Yor process obtained by truncating the stick-breaking representation. The truncation is determined by a random stopping rule that achieves an almost sure control on the approximation error in total variation distance. We derive the asymptotic distribution of the random truncation point as the approximation error $epsilon$ goes to zero in terms of a polynomially tilted positive stable random variable. The practical usefulness and effectiveness of this theoretical result is demonstrated by devising a sampling algorithm to approximate functionals of the $epsilon$ -version of the Pitman–Yor process.




pr

Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models

Yushu Shi, Michael Martens, Anjishnu Banerjee, Purushottam Laud.

Source: Bayesian Analysis, Volume 14, Number 3, 677--702.

Abstract:
Dirichlet process mixture (DPM) models provide flexible modeling for distributions of data as an infinite mixture of distributions from a chosen collection. Specifying priors for these models in individual data contexts can be challenging. In this paper, we introduce a scheme which requires the investigator to specify only simple scaling information. This is used to transform the data to a fixed scale on which a low information prior is constructed. Samples from the posterior with the rescaled data are transformed back for inference on the original scale. The low information prior is selected to provide a wide variety of components for the DPM to generate flexible distributions for the data on the fixed scale. The method can be applied to all DPM models with kernel functions closed under a suitable scaling transformation. Construction of the low information prior, however, is kernel dependent. Using DPM-of-Gaussians and DPM-of-Weibulls models as examples, we show that the method provides accurate estimates of a diverse collection of distributions that includes skewed, multimodal, and highly dispersed members. With the recommended priors, repeated data simulations show performance comparable to that of standard empirical estimates. Finally, we show weak convergence of posteriors with the proposed priors for both kernels considered.




pr

A Bayesian Nonparametric Multiple Testing Procedure for Comparing Several Treatments Against a Control

Luis Gutiérrez, Andrés F. Barrientos, Jorge González, Daniel Taylor-Rodríguez.

Source: Bayesian Analysis, Volume 14, Number 2, 649--675.

Abstract:
We propose a Bayesian nonparametric strategy to test for differences between a control group and several treatment regimes. Most of the existing tests for this type of comparison are based on the differences between location parameters. In contrast, our approach identifies differences across the entire distribution, avoids strong modeling assumptions over the distributions for each treatment, and accounts for multiple testing through the prior distribution on the space of hypotheses. The proposal is compared to other commonly used hypothesis testing procedures under simulated scenarios. Two real applications are also analyzed with the proposed methodology.




pr

Alleviating Spatial Confounding for Areal Data Problems by Displacing the Geographical Centroids

Marcos Oliveira Prates, Renato Martins Assunção, Erica Castilho Rodrigues.

Source: Bayesian Analysis, Volume 14, Number 2, 623--647.

Abstract:
Spatial confounding between the spatial random effects and fixed effects covariates has been recently discovered and showed that it may bring misleading interpretation to the model results. Techniques to alleviate this problem are based on decomposing the spatial random effect and fitting a restricted spatial regression. In this paper, we propose a different approach: a transformation of the geographic space to ensure that the unobserved spatial random effect added to the regression is orthogonal to the fixed effects covariates. Our approach, named SPOCK, has the additional benefit of providing a fast and simple computational method to estimate the parameters. Also, it does not constrain the distribution class assumed for the spatial error term. A simulation study and real data analyses are presented to better understand the advantages of the new method in comparison with the existing ones.




pr

Efficient Acquisition Rules for Model-Based Approximate Bayesian Computation

Marko Järvenpää, Michael U. Gutmann, Arijus Pleska, Aki Vehtari, Pekka Marttinen.

Source: Bayesian Analysis, Volume 14, Number 2, 595--622.

Abstract:
Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next but common BO strategies are not designed for the goal of estimating the posterior distribution. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to a lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies.




pr

A Bayesian Nonparametric Spiked Process Prior for Dynamic Model Selection

Alberto Cassese, Weixuan Zhu, Michele Guindani, Marina Vannucci.

Source: Bayesian Analysis, Volume 14, Number 2, 553--572.

Abstract:
In many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or “normal” behavior. In this manuscript, we consider the monitoring of pneumonia and influenza (P&I) mortality, to detect influenza outbreaks in the continental United States, and propose a Bayesian nonparametric model selection approach to take into account the spatio-temporal dependence of outbreaks. More specifically, we introduce a zero-inflated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random field prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching based models, and with spike-and-slab Bayesian nonparametric priors that do not take into account spatio-temporal dependence.




pr

Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model

Łukasz Rajkowski.

Source: Bayesian Analysis, Volume 14, Number 2, 477--494.

Abstract:
Mixture models are a natural choice in many applications, but it can be difficult to place an a priori upper bound on the number of components. To circumvent this, investigators are turning increasingly to Dirichlet process mixture models (DPMMs). It is therefore important to develop an understanding of the strengths and weaknesses of this approach. This work considers the MAP (maximum a posteriori) clustering for the Gaussian DPMM (where the cluster means have Gaussian distribution and, for each cluster, the observations within the cluster have Gaussian distribution). Some desirable properties of the MAP partition are proved: ‘almost disjointness’ of the convex hulls of clusters (they may have at most one point in common) and (with natural assumptions) the comparability of sizes of those clusters that intersect any fixed ball with the number of observations (as the latter goes to infinity). Consequently, the number of such clusters remains bounded. Furthermore, if the data arises from independent identically distributed sampling from a given distribution with bounded support then the asymptotic MAP partition of the observation space maximises a function which has a straightforward expression, which depends only on the within-group covariance parameter. As the operator norm of this covariance parameter decreases, the number of clusters in the MAP partition becomes arbitrarily large, which may lead to the overestimation of the number of mixture components.




pr

A Bayesian Approach to Statistical Shape Analysis via the Projected Normal Distribution

Luis Gutiérrez, Eduardo Gutiérrez-Peña, Ramsés H. Mena.

Source: Bayesian Analysis, Volume 14, Number 2, 427--447.

Abstract:
This work presents a Bayesian predictive approach to statistical shape analysis. A modeling strategy that starts with a Gaussian distribution on the configuration space, and then removes the effects of location, rotation and scale, is studied. This boils down to an application of the projected normal distribution to model the configurations in the shape space, which together with certain identifiability constraints, facilitates parameter interpretation. Having better control over the parameters allows us to generalize the model to a regression setting where the effect of predictors on shapes can be considered. The methodology is illustrated and tested using both simulated scenarios and a real data set concerning eight anatomical landmarks on a sagittal plane of the corpus callosum in patients with autism and in a group of controls.




pr

Bayesian Effect Fusion for Categorical Predictors

Daniela Pauger, Helga Wagner.

Source: Bayesian Analysis, Volume 14, Number 2, 341--369.

Abstract:
We propose a Bayesian approach to obtain a sparse representation of the effect of a categorical predictor in regression type models. As this effect is captured by a group of level effects, sparsity cannot only be achieved by excluding single irrelevant level effects or the whole group of effects associated to this predictor but also by fusing levels which have essentially the same effect on the response. To achieve this goal, we propose a prior which allows for almost perfect as well as almost zero dependence between level effects a priori. This prior can alternatively be obtained by specifying spike and slab prior distributions on all effect differences associated to this categorical predictor. We show how restricted fusion can be implemented and develop an efficient MCMC (Markov chain Monte Carlo) method for posterior computation. The performance of the proposed method is investigated on simulated data and we illustrate its application on real data from EU-SILC (European Union Statistics on Income and Living Conditions).




pr

Modeling Population Structure Under Hierarchical Dirichlet Processes

Lloyd T. Elliott, Maria De Iorio, Stefano Favaro, Kaustubh Adhikari, Yee Whye Teh.

Source: Bayesian Analysis, Volume 14, Number 2, 313--339.

Abstract:
We propose a Bayesian nonparametric model to infer population admixture, extending the hierarchical Dirichlet process to allow for correlation between loci due to linkage disequilibrium. Given multilocus genotype data from a sample of individuals, the proposed model allows inferring and classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process, and can be applied to most of the commonly used genetic markers. We present a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference from the model and we discuss some methods to summarize the MCMC output for the analysis of population admixture. Finally, we demonstrate the performance of the proposed model in a real application, using genetic data from the ectodysplasin-A receptor (EDAR) gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans. We also conduct a simulated experiment and show that our algorithm outperforms parametric methods.




pr

Separable covariance arrays via the Tucker product, with applications to multivariate relational data

Peter D. Hoff

Source: Bayesian Anal., Volume 6, Number 2, 179--196.

Abstract:
Modern datasets are often in the form of matrices or arrays, potentially having correlations along each set of data indices. For example, data involving repeated measurements of several variables over time may exhibit temporal correlation as well as correlation among the variables. A possible model for matrix-valued data is the class of matrix normal distributions, which is parametrized by two covariance matrices, one for each index set of the data. In this article we discuss an extension of the matrix normal model to accommodate multidimensional data arrays, or tensors. We show how a particular array-matrix product can be used to generate the class of array normal distributions having separable covariance structure. We derive some properties of these covariance structures and the corresponding array normal distributions, and show how the array-matrix product can be used to define a semi-conjugate prior distribution and calculate the corresponding posterior distribution. We illustrate the methodology in an analysis of multivariate longitudinal network data which take the form of a four-way array.




pr

A Tale of Two Parasites: Statistical Modelling to Support Disease Control Programmes in Africa

Peter J. Diggle, Emanuele Giorgi, Julienne Atsame, Sylvie Ntsame Ella, Kisito Ogoussan, Katherine Gass.

Source: Statistical Science, Volume 35, Number 1, 42--50.

Abstract:
Vector-borne diseases have long presented major challenges to the health of rural communities in the wet tropical regions of the world, but especially in sub-Saharan Africa. In this paper, we describe the contribution that statistical modelling has made to the global elimination programme for one vector-borne disease, onchocerciasis. We explain why information on the spatial distribution of a second vector-borne disease, Loa loa, is needed before communities at high risk of onchocerciasis can be treated safely with mass distribution of ivermectin, an antifiarial medication. We show how a model-based geostatistical analysis of Loa loa prevalence survey data can be used to map the predictive probability that each location in the region of interest meets a WHO policy guideline for safe mass distribution of ivermectin and describe two applications: one is to data from Cameroon that assesses prevalence using traditional blood-smear microscopy; the other is to Africa-wide data that uses a low-cost questionnaire-based method. We describe how a recent technological development in image-based microscopy has resulted in a change of emphasis from prevalence alone to the bivariate spatial distribution of prevalence and the intensity of infection among infected individuals. We discuss how statistical modelling of the kind described here can contribute to health policy guidelines and decision-making in two ways. One is to ensure that, in a resource-limited setting, prevalence surveys are designed, and the resulting data analysed, as efficiently as possible. The other is to provide an honest quantification of the uncertainty attached to any binary decision by reporting predictive probabilities that a policy-defined condition for action is or is not met.




pr

Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression

Zhixiang Lin, Mahdi Zamanighomi, Timothy Daley, Shining Ma, Wing Hung Wong.

Source: Statistical Science, Volume 35, Number 1, 2--13.

Abstract:
Unsupervised methods, including clustering methods, are essential to the analysis of single-cell genomic data. Model-based clustering methods are under-explored in the area of single-cell genomics, and have the advantage of quantifying the uncertainty of the clustering result. Here we develop a model-based approach for the integrative analysis of single-cell chromatin accessibility and gene expression data. We show that combining these two types of data, we can achieve a better separation of the underlying cell types. An efficient Markov chain Monte Carlo algorithm is also developed.




pr

Models as Approximations—Rejoinder

Andreas Buja, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Eric Tchetgen Tchetgen, Linda Zhao.

Source: Statistical Science, Volume 34, Number 4, 606--620.

Abstract:
We respond to the discussants of our articles emphasizing the importance of inference under misspecification in the context of the reproducibility/replicability crisis. Along the way, we discuss the roles of diagnostics and model building in regression as well as connections between our well-specification framework and semiparametric theory.




pr

Discussion: Models as Approximations

Dalia Ghanem, Todd A. Kuffner.

Source: Statistical Science, Volume 34, Number 4, 604--605.




pr

Comment: Statistical Inference from a Predictive Perspective

Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman.

Source: Statistical Science, Volume 34, Number 4, 599--603.

Abstract:
What is the meaning of a regression parameter? Why is this the de facto standard object of interest for statistical inference? These are delicate issues, especially when the model is misspecified. We argue that focusing on predictive quantities may be a desirable alternative.




pr

Comment: Models as (Deliberate) Approximations

David Whitney, Ali Shojaie, Marco Carone.

Source: Statistical Science, Volume 34, Number 4, 591--598.




pr

Comment: Models Are Approximations!

Anthony C. Davison, Erwan Koch, Jonathan Koh.

Source: Statistical Science, Volume 34, Number 4, 584--590.

Abstract:
This discussion focuses on areas of disagreement with the papers, particularly the target of inference and the case for using the robust ‘sandwich’ variance estimator in the presence of moderate mis-specification. We also suggest that existing procedures may be appreciably more powerful for detecting mis-specification than the authors’ RAV statistic, and comment on the use of the pairs bootstrap in balanced situations.




pr

Comment: “Models as Approximations I: Consequences Illustrated with Linear Regression” by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, L. Zhan and K. Zhang

Roderick J. Little.

Source: Statistical Science, Volume 34, Number 4, 580--583.




pr

Discussion of Models as Approximations I & II

Dag Tjøstheim.

Source: Statistical Science, Volume 34, Number 4, 575--579.




pr

Comment: Models as Approximations

Nikki L. B. Freeman, Xiaotong Jiang, Owen E. Leete, Daniel J. Luckett, Teeranan Pokaprakarn, Michael R. Kosorok.

Source: Statistical Science, Volume 34, Number 4, 572--574.




pr

Comment on Models as Approximations, Parts I and II, by Buja et al.

Jerald F. Lawless.

Source: Statistical Science, Volume 34, Number 4, 569--571.

Abstract:
I comment on the papers Models as Approximations I and II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zhao and K. Zhang.




pr

Discussion of Models as Approximations I & II

Sara van de Geer.

Source: Statistical Science, Volume 34, Number 4, 566--568.

Abstract:
We discuss the papers “Models as Approximations” I & II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zao and K. Zhang (Part I) and A. Buja, L. Brown, A. K. Kuchibhota, R. Berk, E. George and L. Zhao (Part II). We present a summary with some details for the generalized linear model.




pr

Models as Approximations II: A Model-Free Theory of Parametric Regression

Andreas Buja, Lawrence Brown, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Linda Zhao.

Source: Statistical Science, Volume 34, Number 4, 545--565.

Abstract:
We develop a model-free theory of general types of parametric regression for i.i.d. observations. The theory replaces the parameters of parametric models with statistical functionals, to be called “regression functionals,” defined on large nonparametric classes of joint ${x extrm{-}y}$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective functions. An example of a regression functional is the vector of slopes of linear equations fitted by OLS to largely arbitrary ${x extrm{-}y}$ distributions, without assuming a linear model (see Part I). More generally, regression functionals can be defined by minimizing objective functions, solving estimating equations, or with ad hoc constructions. In this framework, it is possible to achieve the following: (1) define a notion of “well-specification” for regression functionals that replaces the notion of correct specification of models, (2) propose a well-specification diagnostic for regression functionals based on reweighting distributions and data, (3) decompose sampling variability of regression functionals into two sources, one due to the conditional response distribution and another due to the regressor distribution interacting with misspecification, both of order $N^{-1/2}$, (4) exhibit plug-in/sandwich estimators of standard error as limit cases of ${x extrm{-}y}$ bootstrap estimators, and (5) provide theoretical heuristics to indicate that ${x extrm{-}y}$ bootstrap standard errors may generally be preferred over sandwich estimators.




pr

Models as Approximations I: Consequences Illustrated with Linear Regression

Andreas Buja, Lawrence Brown, Richard Berk, Edward George, Emil Pitkin, Mikhail Traskin, Kai Zhang, Linda Zhao.

Source: Statistical Science, Volume 34, Number 4, 523--544.

Abstract:
In the early 1980s, Halbert White inaugurated a “model-robust” form of statistical inference based on the “sandwich estimator” of standard error. This estimator is known to be “heteroskedasticity-consistent,” but it is less well known to be “nonlinearity-consistent” as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence cannot be treated as fixed. The consequences are deep: (1) population slopes need to be reinterpreted as statistical functionals obtained from OLS fits to largely arbitrary joint ${x extrm{-}y}$ distributions; (2) the meaning of slope parameters needs to be rethought; (3) the regressor distribution affects the slope parameters; (4) randomness of the regressors becomes a source of sampling variability in slope estimates of order $1/sqrt{N}$; (5) inference needs to be based on model-robust standard errors, including sandwich estimators or the ${x extrm{-}y}$ bootstrap. In theory, model-robust and model-trusting standard errors can deviate by arbitrary magnitudes either way. In practice, significant deviations between them can be detected with a diagnostic test.




pr

Producing Official County-Level Agricultural Estimates in the United States: Needs and Challenges

Nathan B. Cruze, Andreea L. Erciulescu, Balgobin Nandram, Wendy J. Barboza, Linda J. Young.

Source: Statistical Science, Volume 34, Number 2, 301--316.

Abstract:
In the United States, county-level estimates of crop yield, production, and acreage published by the United States Department of Agriculture’s National Agricultural Statistics Service (USDA NASS) play an important role in determining the value of payments allotted to farmers and ranchers enrolled in several federal programs. Given the importance of these official county-level crop estimates, NASS continually strives to improve its crops county estimates program in terms of accuracy, reliability and coverage. In 2015, NASS engaged a panel of experts convened under the auspices of the National Academies of Sciences, Engineering, and Medicine Committee on National Statistics (CNSTAT) for guidance on implementing models that may synthesize multiple sources of information into a single estimate, provide defensible measures of uncertainty, and potentially increase the number of publishable county estimates. The final report titled Improving Crop Estimates by Integrating Multiple Data Sources was released in 2017. This paper discusses several needs and requirements for NASS county-level crop estimates that were illuminated during the activities of the CNSTAT panel. A motivating example of planted acreage estimation in Illinois illustrates several challenges that NASS faces as it considers adopting any explicit model for official crops county estimates.




pr

A Kernel Regression Procedure in the 3D Shape Space with an Application to Online Sales of Children’s Wear

Gregorio Quintana-Ortí, Amelia Simó.

Source: Statistical Science, Volume 34, Number 2, 236--252.

Abstract:
This paper is focused on kernel regression when the response variable is the shape of a 3D object represented by a configuration matrix of landmarks. Regression methods on this shape space are not trivial because this space has a complex finite-dimensional Riemannian manifold structure (non-Euclidean). Papers about it are scarce in the literature, the majority of them are restricted to the case of a single explanatory variable, and many of them are based on the approximated tangent space. In this paper, there are several methodological innovations. The first one is the adaptation of the general method for kernel regression analysis in manifold-valued data to the three-dimensional case of Kendall’s shape space. The second one is its generalization to the multivariate case and the addressing of the curse-of-dimensionality problem. Finally, we propose bootstrap confidence intervals for prediction. A simulation study is carried out to check the goodness of the procedure, and a comparison with a current approach is performed. Then, it is applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children’s wear.




pr

Gaussian Integrals and Rice Series in Crossing Distributions—to Compute the Distribution of Maxima and Other Features of Gaussian Processes

Georg Lindgren.

Source: Statistical Science, Volume 34, Number 1, 100--128.

Abstract:
We describe and compare how methods based on the classical Rice’s formula for the expected number, and higher moments, of level crossings by a Gaussian process stand up to contemporary numerical methods to accurately deal with crossing related characteristics of the sample paths. We illustrate the relative merits in accuracy and computing time of the Rice moment methods and the exact numerical method, developed since the late 1990s, on three groups of distribution problems, the maximum over a finite interval and the waiting time to first crossing, the length of excursions over a level, and the joint period/amplitude of oscillations. We also treat the notoriously difficult problem of dependence between successive zero crossing distances. The exact solution has been known since at least 2000, but it has remained largely unnoticed outside the ocean science community. Extensive simulation studies illustrate the accuracy of the numerical methods. As a historical introduction an attempt is made to illustrate the relation between Rice’s original formulation and arguments and the exact numerical methods.




pr

[Silhouette of a pregant woman smoking with death skull inside womb, 29 January 1994] / design: Biman Mullick.

London, [29 January 1994]




pr

The Joyful Reduction of Uncertainty: Music Perception as a Window to Predictive Neuronal Processing




pr

The Cognitive Thalamus as a Gateway to Mental Representations

Mathieu Wolff
Jan 2, 2019; 39:3-14
Viewpoints




pr

Brain-Derived Neurotrophic Factor Protection of Cortical Neurons from Serum Withdrawal-Induced Apoptosis Is Inhibited by cAMP

Steven Poser
Jun 1, 2003; 23:4420-4427
Cellular




pr

Physical Exercise Prevents Stress-Induced Activation of Granule Neurons and Enhances Local Inhibitory Mechanisms in the Dentate Gyrus

Timothy J. Schoenfeld
May 1, 2013; 33:7770-7777
BehavioralSystemsCognitive




pr

Sleep Deprivation Biases the Neural Mechanisms Underlying Economic Preferences

Vinod Venkatraman
Mar 9, 2011; 31:3712-3718
BehavioralSystemsCognitive




pr

Dissociable Intrinsic Connectivity Networks for Salience Processing and Executive Control

William W. Seeley
Feb 28, 2007; 27:2349-2356
BehavioralSystemsCognitive




pr

Synaptic Specificity and Application of Anterograde Transsynaptic AAV for Probing Neural Circuitry

Brian Zingg
Apr 15, 2020; 40:3250-3267
Systems/Circuits




pr

The Representation of Semantic Information Across Human Cerebral Cortex During Listening Versus Reading Is Invariant to Stimulus Modality

Fatma Deniz
Sep 25, 2019; 39:7722-7736
BehavioralSystemsCognitive




pr

Dural Calcitonin Gene-Related Peptide Produces Female-Specific Responses in Rodent Migraine Models

Amanda Avona
May 29, 2019; 39:4323-4331
Systems/Circuits




pr

What Visual Information Is Processed in the Human Dorsal Stream?

Martin N. Hebart
Jun 13, 2012; 32:8107-8109
Journal Club




pr

Endothelial Adora2a Activation Promotes Blood-Brain Barrier Breakdown and Cognitive Impairment in Mice with Diet-Induced Insulin Resistance

Masaki Yamamoto
May 22, 2019; 39:4179-4192
Neurobiology of Disease




pr

Sleep Loss Promotes Astrocytic Phagocytosis and Microglial Activation in Mouse Cerebral Cortex

Michele Bellesi
May 24, 2017; 37:5263-5273
Cellular




pr

Increased Neural Activity in Mesostriatal Regions after Prefrontal Transcranial Direct Current Stimulation and L-DOPA Administration

Benjamin Meyer
Jul 3, 2019; 39:5326-5335
Systems/Circuits