ma Bayesian Estimation Under Informative Sampling with Unattenuated Dependence By projecteuclid.org Published On :: Mon, 13 Jan 2020 04:00 EST Matthew R. Williams, Terrance D. Savitsky. Source: Bayesian Analysis, Volume 15, Number 1, 57--77.Abstract: An informative sampling design leads to unit inclusion probabilities that are correlated with the response variable of interest. However, multistage sampling designs may also induce higher order dependencies, which are ignored in the literature when establishing consistency of estimators for survey data under a condition requiring asymptotic independence among the unit inclusion probabilities. This paper constructs new theoretical conditions that guarantee that the pseudo-posterior, which uses sampling weights based on first order inclusion probabilities to exponentiate the likelihood, is consistent not only for survey designs which have asymptotic factorization, but also for survey designs that induce residual or unattenuated dependence among sampled units. The use of the survey-weighted pseudo-posterior, together with our relaxed requirements for the survey design, establish a wide variety of analysis models that can be applied to a broad class of survey data sets. Using the complex sampling design of the National Survey on Drug Use and Health, we demonstrate our new theoretical result on multistage designs characterized by a cluster sampling step that expresses within-cluster dependence. We explore the impact of multistage designs and order based sampling. Full Article
ma Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Andrea Cremaschi, Raffaele Argiento, Katherine Shoemaker, Christine Peterson, Marina Vannucci. Source: Bayesian Analysis, Volume 14, Number 4, 1271--1301.Abstract: Gaussian graphical models are useful tools for exploring network structures in multivariate normal data. In this paper we are interested in situations where data show departures from Gaussianity, therefore requiring alternative modeling distributions. The multivariate $t$ -distribution, obtained by dividing each component of the data vector by a gamma random variable, is a straightforward generalization to accommodate deviations from normality such as heavy tails. Since different groups of variables may be contaminated to a different extent, Finegold and Drton (2014) introduced the Dirichlet $t$ -distribution, where the divisors are clustered using a Dirichlet process. In this work, we consider a more general class of nonparametric distributions as the prior on the divisor terms, namely the class of normalized completely random measures (NormCRMs). To improve the effectiveness of the clustering, we propose modeling the dependence among the divisors through a nonparametric hierarchical structure, which allows for the sharing of parameters across the samples in the data set. This desirable feature enables us to cluster together different components of multivariate data in a parsimonious way. We demonstrate through simulations that this approach provides accurate graphical model inference, and apply it to a case study examining the dependence structure in radiomics data derived from The Cancer Imaging Atlas. Full Article
ma Calibration Procedures for Approximate Bayesian Credible Sets By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Jeong Eun Lee, Geoff K. Nicholls, Robin J. Ryder. Source: Bayesian Analysis, Volume 14, Number 4, 1245--1269.Abstract: We develop and apply two calibration procedures for checking the coverage of approximate Bayesian credible sets, including intervals estimated using Monte Carlo methods. The user has an ideal prior and likelihood, but generates a credible set for an approximate posterior based on some approximate prior and likelihood. We estimate the realised posterior coverage achieved by the approximate credible set. This is the coverage of the unknown “true” parameter if the data are a realisation of the user’s ideal observation model conditioned on the parameter, and the parameter is a draw from the user’s ideal prior. In one approach we estimate the posterior coverage at the data by making a semi-parametric logistic regression of binary coverage outcomes on simulated data against summary statistics evaluated on simulated data. In another we use Importance Sampling from the approximate posterior, windowing simulated data to fall close to the observed data. We illustrate our methods on four examples. Full Article
ma Spatial Disease Mapping Using Directed Acyclic Graph Auto-Regressive (DAGAR) Models By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Abhirup Datta, Sudipto Banerjee, James S. Hodges, Leiwen Gao. Source: Bayesian Analysis, Volume 14, Number 4, 1221--1244.Abstract: Hierarchical models for regionally aggregated disease incidence data commonly involve region specific latent random effects that are modeled jointly as having a multivariate Gaussian distribution. The covariance or precision matrix incorporates the spatial dependence between the regions. Common choices for the precision matrix include the widely used ICAR model, which is singular, and its nonsingular extension which lacks interpretability. We propose a new parametric model for the precision matrix based on a directed acyclic graph (DAG) representation of the spatial dependence. Our model guarantees positive definiteness and, hence, in addition to being a valid prior for regional spatially correlated random effects, can also directly model the outcome from dependent data like images and networks. Theoretical results establish a link between the parameters in our model and the variance and covariances of the random effects. Simulation studies demonstrate that the improved interpretability of our model reaps benefits in terms of accurately recovering the latent spatial random effects as well as for inference on the spatial covariance parameters. Under modest spatial correlation, our model far outperforms the CAR models, while the performances are similar when the spatial correlation is strong. We also assess sensitivity to the choice of the ordering in the DAG construction using theoretical and empirical results which testify to the robustness of our model. We also present a large-scale public health application demonstrating the competitive performance of the model. Full Article
ma Estimating the Use of Public Lands: Integrated Modeling of Open Populations with Convolution Likelihood Ecological Abundance Regression By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Lutz F. Gruber, Erica F. Stuber, Lyndsie S. Wszola, Joseph J. Fontaine. Source: Bayesian Analysis, Volume 14, Number 4, 1173--1199.Abstract: We present an integrated open population model where the population dynamics are defined by a differential equation, and the related statistical model utilizes a Poisson binomial convolution likelihood. Key advantages of the proposed approach over existing open population models include the flexibility to predict related, but unobserved quantities such as total immigration or emigration over a specified time period, and more computationally efficient posterior simulation by elimination of the need to explicitly simulate latent immigration and emigration. The viability of the proposed method is shown in an in-depth analysis of outdoor recreation participation on public lands, where the surveyed populations changed rapidly and demographic population closure cannot be assumed even within a single day. Full Article
ma Post-Processing Posteriors Over Precision Matrices to Produce Sparse Graph Estimates By projecteuclid.org Published On :: Thu, 19 Dec 2019 22:10 EST Amir Bashir, Carlos M. Carvalho, P. Richard Hahn, M. Beatrix Jones. Source: Bayesian Analysis, Volume 14, Number 4, 1075--1090.Abstract: A variety of computationally efficient Bayesian models for the covariance matrix of a multivariate Gaussian distribution are available. However, all produce a relatively dense estimate of the precision matrix, and are therefore unsatisfactory when one wishes to use the precision matrix to consider the conditional independence structure of the data. This paper considers the posterior predictive distribution of model fit for these covariance models. We then undertake post-processing of the Bayes point estimate for the precision matrix to produce a sparse model whose expected fit lies within the upper 95% of the posterior predictive distribution of fit. The impact of the method for selecting the zero elements of the precision matrix is evaluated. Good results were obtained using models that encouraged a sparse posterior (G-Wishart, Bayesian adaptive graphical lasso) and selection using credible intervals. We also find that this approach is easily extended to the problem of finding a sparse set of elements that differ across a set of precision matrices, a natural summary when a common set of variables is observed under multiple conditions. We illustrate our findings with moderate dimensional data examples from finance and metabolomics. Full Article
ma Extrinsic Gaussian Processes for Regression and Classification on Manifolds By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Lizhen Lin, Niu Mu, Pokman Cheung, David Dunson. Source: Bayesian Analysis, Volume 14, Number 3, 907--926.Abstract: Gaussian processes (GPs) are very widely used for modeling of unknown functions or surfaces in applications ranging from regression to classification to spatial processes. Although there is an increasingly vast literature on applications, methods, theory and algorithms related to GPs, the overwhelming majority of this literature focuses on the case in which the input domain corresponds to a Euclidean space. However, particularly in recent years with the increasing collection of complex data, it is commonly the case that the input domain does not have such a simple form. For example, it is common for the inputs to be restricted to a non-Euclidean manifold, a case which forms the motivation for this article. In particular, we propose a general extrinsic framework for GP modeling on manifolds, which relies on embedding of the manifold into a Euclidean space and then constructing extrinsic kernels for GPs on their images. These extrinsic Gaussian processes (eGPs) are used as prior distributions for unknown functions in Bayesian inferences. Our approach is simple and general, and we show that the eGPs inherit fine theoretical properties from GP models in Euclidean spaces. We consider applications of our models to regression and classification problems with predictors lying in a large class of manifolds, including spheres, planar shape spaces, a space of positive definite matrices, and Grassmannians. Our models can be readily used by practitioners in biological sciences for various regression and classification problems, such as disease diagnosis or detection. Our work is also likely to have impact in spatial statistics when spatial locations are on the sphere or other geometric spaces. Full Article
ma Bayesian Zero-Inflated Negative Binomial Regression Based on Pólya-Gamma Mixtures By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Brian Neelon. Source: Bayesian Analysis, Volume 14, Number 3, 849--875.Abstract: Motivated by a study examining spatiotemporal patterns in inpatient hospitalizations, we propose an efficient Bayesian approach for fitting zero-inflated negative binomial models. To facilitate posterior sampling, we introduce a set of latent variables that are represented as scale mixtures of normals, where the precision terms follow independent Pólya-Gamma distributions. Conditional on the latent variables, inference proceeds via straightforward Gibbs sampling. For fixed-effects models, our approach is comparable to existing methods. However, our model can accommodate more complex data structures, including multivariate and spatiotemporal data, settings in which current approaches often fail due to computational challenges. Using simulation studies, we highlight key features of the method and compare its performance to other estimation procedures. We apply the approach to a spatiotemporal analysis examining the number of annual inpatient admissions among United States veterans with type 2 diabetes. Full Article
ma Probability Based Independence Sampler for Bayesian Quantitative Learning in Graphical Log-Linear Marginal Models By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Ioannis Ntzoufras, Claudia Tarantola, Monia Lupparelli. Source: Bayesian Analysis, Volume 14, Number 3, 797--823.Abstract: We introduce a novel Bayesian approach for quantitative learning for graphical log-linear marginal models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. The likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and Markov Chain Monte Carlo (MCMC) methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for such models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. We exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset. Full Article
ma Sequential Monte Carlo Samplers with Independent Markov Chain Monte Carlo Proposals By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT L. F. South, A. N. Pettitt, C. C. Drovandi. Source: Bayesian Analysis, Volume 14, Number 3, 773--796.Abstract: Sequential Monte Carlo (SMC) methods for sampling from the posterior of static Bayesian models are flexible, parallelisable and capable of handling complex targets. However, it is common practice to adopt a Markov chain Monte Carlo (MCMC) kernel with a multivariate normal random walk (RW) proposal in the move step, which can be both inefficient and detrimental for exploring challenging posterior distributions. We develop new SMC methods with independent proposals which allow recycling of all candidates generated in the SMC process and are embarrassingly parallelisable. A novel evidence estimator that is easily computed from the output of our independent SMC is proposed. Our independent proposals are constructed via flexible copula-type models calibrated with the population of SMC particles. We demonstrate through several examples that more precise estimates of posterior expectations and the marginal likelihood can be obtained using fewer likelihood evaluations than the more standard RW approach. Full Article
ma Stochastic Approximations to the Pitman–Yor Process By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Julyan Arbel, Pierpaolo De Blasi, Igor Prünster. Source: Bayesian Analysis, Volume 14, Number 3, 753--771.Abstract: In this paper we consider approximations to the popular Pitman–Yor process obtained by truncating the stick-breaking representation. The truncation is determined by a random stopping rule that achieves an almost sure control on the approximation error in total variation distance. We derive the asymptotic distribution of the random truncation point as the approximation error $epsilon$ goes to zero in terms of a polynomially tilted positive stable random variable. The practical usefulness and effectiveness of this theoretical result is demonstrated by devising a sampling algorithm to approximate functionals of the $epsilon$ -version of the Pitman–Yor process. Full Article
ma Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models By projecteuclid.org Published On :: Tue, 11 Jun 2019 04:00 EDT Yushu Shi, Michael Martens, Anjishnu Banerjee, Purushottam Laud. Source: Bayesian Analysis, Volume 14, Number 3, 677--702.Abstract: Dirichlet process mixture (DPM) models provide flexible modeling for distributions of data as an infinite mixture of distributions from a chosen collection. Specifying priors for these models in individual data contexts can be challenging. In this paper, we introduce a scheme which requires the investigator to specify only simple scaling information. This is used to transform the data to a fixed scale on which a low information prior is constructed. Samples from the posterior with the rescaled data are transformed back for inference on the original scale. The low information prior is selected to provide a wide variety of components for the DPM to generate flexible distributions for the data on the fixed scale. The method can be applied to all DPM models with kernel functions closed under a suitable scaling transformation. Construction of the low information prior, however, is kernel dependent. Using DPM-of-Gaussians and DPM-of-Weibulls models as examples, we show that the method provides accurate estimates of a diverse collection of distributions that includes skewed, multimodal, and highly dispersed members. With the recommended priors, repeated data simulations show performance comparable to that of standard empirical estimates. Finally, we show weak convergence of posteriors with the proposed priors for both kernels considered. Full Article
ma Efficient Acquisition Rules for Model-Based Approximate Bayesian Computation By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Marko Järvenpää, Michael U. Gutmann, Arijus Pleska, Aki Vehtari, Pekka Marttinen. Source: Bayesian Analysis, Volume 14, Number 2, 595--622.Abstract: Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next but common BO strategies are not designed for the goal of estimating the posterior distribution. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to a lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies. Full Article
ma Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Łukasz Rajkowski. Source: Bayesian Analysis, Volume 14, Number 2, 477--494.Abstract: Mixture models are a natural choice in many applications, but it can be difficult to place an a priori upper bound on the number of components. To circumvent this, investigators are turning increasingly to Dirichlet process mixture models (DPMMs). It is therefore important to develop an understanding of the strengths and weaknesses of this approach. This work considers the MAP (maximum a posteriori) clustering for the Gaussian DPMM (where the cluster means have Gaussian distribution and, for each cluster, the observations within the cluster have Gaussian distribution). Some desirable properties of the MAP partition are proved: ‘almost disjointness’ of the convex hulls of clusters (they may have at most one point in common) and (with natural assumptions) the comparability of sizes of those clusters that intersect any fixed ball with the number of observations (as the latter goes to infinity). Consequently, the number of such clusters remains bounded. Furthermore, if the data arises from independent identically distributed sampling from a given distribution with bounded support then the asymptotic MAP partition of the observation space maximises a function which has a straightforward expression, which depends only on the within-group covariance parameter. As the operator norm of this covariance parameter decreases, the number of clusters in the MAP partition becomes arbitrarily large, which may lead to the overestimation of the number of mixture components. Full Article
ma A Bayesian Approach to Statistical Shape Analysis via the Projected Normal Distribution By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Luis Gutiérrez, Eduardo Gutiérrez-Peña, Ramsés H. Mena. Source: Bayesian Analysis, Volume 14, Number 2, 427--447.Abstract: This work presents a Bayesian predictive approach to statistical shape analysis. A modeling strategy that starts with a Gaussian distribution on the configuration space, and then removes the effects of location, rotation and scale, is studied. This boils down to an application of the projected normal distribution to model the configurations in the shape space, which together with certain identifiability constraints, facilitates parameter interpretation. Having better control over the parameters allows us to generalize the model to a regression setting where the effect of predictors on shapes can be considered. The methodology is illustrated and tested using both simulated scenarios and a real data set concerning eight anatomical landmarks on a sagittal plane of the corpus callosum in patients with autism and in a group of controls. Full Article
ma Maximum Independent Component Analysis with Application to EEG Data By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Ruosi Guo, Chunming Zhang, Zhengjun Zhang. Source: Statistical Science, Volume 35, Number 1, 145--157.Abstract: In many scientific disciplines, finding hidden influential factors behind observational data is essential but challenging. The majority of existing approaches, such as the independent component analysis (${mathrm{ICA}}$), rely on linear transformation, that is, true signals are linear combinations of hidden components. Motivated from analyzing nonlinear temporal signals in neuroscience, genetics, and finance, this paper proposes the “maximum independent component analysis” (${mathrm{MaxICA}}$), based on max-linear combinations of components. In contrast to existing methods, ${mathrm{MaxICA}}$ benefits from focusing on significant major components while filtering out ignorable components. A major tool for parameter learning of ${mathrm{MaxICA}}$ is an augmented genetic algorithm, consisting of three schemes for the elite weighted sum selection, randomly combined crossover, and dynamic mutation. Extensive empirical evaluations demonstrate the effectiveness of ${mathrm{MaxICA}}$ in either extracting max-linearly combined essential sources in many applications or supplying a better approximation for nonlinearly combined source signals, such as $mathrm{EEG}$ recordings analyzed in this paper. Full Article
ma Some Statistical Issues in Climate Science By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Michael L. Stein. Source: Statistical Science, Volume 35, Number 1, 31--41.Abstract: Climate science is a field that is arguably both data-rich and data-poor. Data rich in that huge and quickly increasing amounts of data about the state of the climate are collected every day. Data poor in that important aspects of the climate are still undersampled, such as the deep oceans and some characteristics of the upper atmosphere. Data rich in that modern climate models can produce climatological quantities over long time periods with global coverage, including quantities that are difficult to measure and under conditions for which there is no data presently. Data poor in that the correspondence between climate model output to the actual climate, especially for future climate change due to human activities, is difficult to assess. The scope for fruitful interactions between climate scientists and statisticians is great, but requires serious commitments from researchers in both disciplines to understand the scientific and statistical nuances arising from the complex relationships between the data and the real-world problems. This paper describes a small fraction of some of the intellectual challenges that occur at the interface between climate science and statistics, including inferences for extremes for processes with seasonality and long-term trends, the use of climate model ensembles for studying extremes, the scope for using new data sources for studying space-time characteristics of environmental processes and a discussion of non-Gaussian space-time process models for climate variables. The paper concludes with a call to the statistical community to become more engaged in one of the great scientific and policy issues of our time, anthropogenic climate change and its impacts. Full Article
ma Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Zhixiang Lin, Mahdi Zamanighomi, Timothy Daley, Shining Ma, Wing Hung Wong. Source: Statistical Science, Volume 35, Number 1, 2--13.Abstract: Unsupervised methods, including clustering methods, are essential to the analysis of single-cell genomic data. Model-based clustering methods are under-explored in the area of single-cell genomics, and have the advantage of quantifying the uncertainty of the clustering result. Here we develop a model-based approach for the integrative analysis of single-cell chromatin accessibility and gene expression data. We show that combining these two types of data, we can achieve a better separation of the underlying cell types. An efficient Markov chain Monte Carlo algorithm is also developed. Full Article
ma Gaussianization Machines for Non-Gaussian Function Estimation Models By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST T. Tony Cai. Source: Statistical Science, Volume 34, Number 4, 635--656.Abstract: A wide range of nonparametric function estimation models have been studied individually in the literature. Among them the homoscedastic nonparametric Gaussian regression is arguably the best known and understood. Inspired by the asymptotic equivalence theory, Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046) and Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433) developed a unified approach to turn a collection of non-Gaussian function estimation models into a standard Gaussian regression and any good Gaussian nonparametric regression method can then be used. These Gaussianization Machines have two key components, binning and transformation. When combined with BlockJS, a wavelet thresholding procedure for Gaussian regression, the procedures are computationally efficient with strong theoretical guarantees. Technical analysis given in Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046) and Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433) shows that the estimators attain the optimal rate of convergence adaptively over a large set of Besov spaces and across a collection of non-Gaussian function estimation models, including robust nonparametric regression, density estimation, and nonparametric regression in exponential families. The estimators are also spatially adaptive. The Gaussianization Machines significantly extend the flexibility and scope of the theories and methodologies originally developed for the conventional nonparametric Gaussian regression. This article aims to provide a concise account of the Gaussianization Machines developed in Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046), Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433). Full Article
ma Models as Approximations—Rejoinder By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Andreas Buja, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Eric Tchetgen Tchetgen, Linda Zhao. Source: Statistical Science, Volume 34, Number 4, 606--620.Abstract: We respond to the discussants of our articles emphasizing the importance of inference under misspecification in the context of the reproducibility/replicability crisis. Along the way, we discuss the roles of diagnostics and model building in regression as well as connections between our well-specification framework and semiparametric theory. Full Article
ma Discussion: Models as Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Dalia Ghanem, Todd A. Kuffner. Source: Statistical Science, Volume 34, Number 4, 604--605. Full Article
ma Comment: Models as (Deliberate) Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST David Whitney, Ali Shojaie, Marco Carone. Source: Statistical Science, Volume 34, Number 4, 591--598. Full Article
ma Comment: Models Are Approximations! By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Anthony C. Davison, Erwan Koch, Jonathan Koh. Source: Statistical Science, Volume 34, Number 4, 584--590.Abstract: This discussion focuses on areas of disagreement with the papers, particularly the target of inference and the case for using the robust ‘sandwich’ variance estimator in the presence of moderate mis-specification. We also suggest that existing procedures may be appreciably more powerful for detecting mis-specification than the authors’ RAV statistic, and comment on the use of the pairs bootstrap in balanced situations. Full Article
ma Comment: “Models as Approximations I: Consequences Illustrated with Linear Regression” by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, L. Zhan and K. Zhang By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Roderick J. Little. Source: Statistical Science, Volume 34, Number 4, 580--583. Full Article
ma Discussion of Models as Approximations I & II By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Dag Tjøstheim. Source: Statistical Science, Volume 34, Number 4, 575--579. Full Article
ma Comment: Models as Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Nikki L. B. Freeman, Xiaotong Jiang, Owen E. Leete, Daniel J. Luckett, Teeranan Pokaprakarn, Michael R. Kosorok. Source: Statistical Science, Volume 34, Number 4, 572--574. Full Article
ma Comment on Models as Approximations, Parts I and II, by Buja et al. By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Jerald F. Lawless. Source: Statistical Science, Volume 34, Number 4, 569--571.Abstract: I comment on the papers Models as Approximations I and II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zhao and K. Zhang. Full Article
ma Discussion of Models as Approximations I & II By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Sara van de Geer. Source: Statistical Science, Volume 34, Number 4, 566--568.Abstract: We discuss the papers “Models as Approximations” I & II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zao and K. Zhang (Part I) and A. Buja, L. Brown, A. K. Kuchibhota, R. Berk, E. George and L. Zhao (Part II). We present a summary with some details for the generalized linear model. Full Article
ma Models as Approximations II: A Model-Free Theory of Parametric Regression By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Andreas Buja, Lawrence Brown, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Linda Zhao. Source: Statistical Science, Volume 34, Number 4, 545--565.Abstract: We develop a model-free theory of general types of parametric regression for i.i.d. observations. The theory replaces the parameters of parametric models with statistical functionals, to be called “regression functionals,” defined on large nonparametric classes of joint ${x extrm{-}y}$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective functions. An example of a regression functional is the vector of slopes of linear equations fitted by OLS to largely arbitrary ${x extrm{-}y}$ distributions, without assuming a linear model (see Part I). More generally, regression functionals can be defined by minimizing objective functions, solving estimating equations, or with ad hoc constructions. In this framework, it is possible to achieve the following: (1) define a notion of “well-specification” for regression functionals that replaces the notion of correct specification of models, (2) propose a well-specification diagnostic for regression functionals based on reweighting distributions and data, (3) decompose sampling variability of regression functionals into two sources, one due to the conditional response distribution and another due to the regressor distribution interacting with misspecification, both of order $N^{-1/2}$, (4) exhibit plug-in/sandwich estimators of standard error as limit cases of ${x extrm{-}y}$ bootstrap estimators, and (5) provide theoretical heuristics to indicate that ${x extrm{-}y}$ bootstrap standard errors may generally be preferred over sandwich estimators. Full Article
ma Models as Approximations I: Consequences Illustrated with Linear Regression By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Andreas Buja, Lawrence Brown, Richard Berk, Edward George, Emil Pitkin, Mikhail Traskin, Kai Zhang, Linda Zhao. Source: Statistical Science, Volume 34, Number 4, 523--544.Abstract: In the early 1980s, Halbert White inaugurated a “model-robust” form of statistical inference based on the “sandwich estimator” of standard error. This estimator is known to be “heteroskedasticity-consistent,” but it is less well known to be “nonlinearity-consistent” as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence cannot be treated as fixed. The consequences are deep: (1) population slopes need to be reinterpreted as statistical functionals obtained from OLS fits to largely arbitrary joint ${x extrm{-}y}$ distributions; (2) the meaning of slope parameters needs to be rethought; (3) the regressor distribution affects the slope parameters; (4) randomness of the regressors becomes a source of sampling variability in slope estimates of order $1/sqrt{N}$; (5) inference needs to be based on model-robust standard errors, including sandwich estimators or the ${x extrm{-}y}$ bootstrap. In theory, model-robust and model-trusting standard errors can deviate by arbitrary magnitudes either way. In practice, significant deviations between them can be detected with a diagnostic test. Full Article
ma User-Friendly Covariance Estimation for Heavy-Tailed Distributions By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Yuan Ke, Stanislav Minsker, Zhao Ren, Qiang Sun, Wen-Xin Zhou. Source: Statistical Science, Volume 34, Number 3, 454--471.Abstract: We provide a survey of recent results on covariance estimation for heavy-tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce elementwise and spectrumwise truncation operators, as well as their $M$-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key insight is that estimators should adapt to the sample size, dimensionality and noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate practical implementation, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods. Full Article
ma ROS Regression: Integrating Regularization with Optimal Scaling Regression By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Jacqueline J. Meulman, Anita J. van der Kooij, Kevin L. W. Duisters. Source: Statistical Science, Volume 34, Number 3, 361--390.Abstract: We present a methodology for multiple regression analysis that deals with categorical variables (possibly mixed with continuous ones), in combination with regularization, variable selection and high-dimensional data ($Pgg N$). Regularization and optimal scaling (OS) are two important extensions of ordinary least squares regression (OLS) that will be combined in this paper. There are two data analytic situations for which optimal scaling was developed. One is the analysis of categorical data, and the other the need for transformations because of nonlinear relationships between predictors and outcome. Optimal scaling of categorical data finds quantifications for the categories, both for the predictors and for the outcome variables, that are optimal for the regression model in the sense that they maximize the multiple correlation. When nonlinear relationships exist, nonlinear transformation of predictors and outcome maximize the multiple correlation in the same way. We will consider a variety of transformation types; typically we use step functions for categorical variables, and smooth (spline) functions for continuous variables. Both types of functions can be restricted to be monotonic, preserving the ordinal information in the data. In combination with optimal scaling, three popular regularization methods will be considered: Ridge regression, the Lasso and the Elastic Net. The resulting method will be called ROS Regression (Regularized Optimal Scaling Regression). The OS algorithm provides straightforward and efficient estimation of the regularized regression coefficients, automatically gives the Group Lasso and Blockwise Sparse Regression, and extends them by the possibility to maintain ordinal properties in the data. Extended examples are provided. Full Article
ma Producing Official County-Level Agricultural Estimates in the United States: Needs and Challenges By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Nathan B. Cruze, Andreea L. Erciulescu, Balgobin Nandram, Wendy J. Barboza, Linda J. Young. Source: Statistical Science, Volume 34, Number 2, 301--316.Abstract: In the United States, county-level estimates of crop yield, production, and acreage published by the United States Department of Agriculture’s National Agricultural Statistics Service (USDA NASS) play an important role in determining the value of payments allotted to farmers and ranchers enrolled in several federal programs. Given the importance of these official county-level crop estimates, NASS continually strives to improve its crops county estimates program in terms of accuracy, reliability and coverage. In 2015, NASS engaged a panel of experts convened under the auspices of the National Academies of Sciences, Engineering, and Medicine Committee on National Statistics (CNSTAT) for guidance on implementing models that may synthesize multiple sources of information into a single estimate, provide defensible measures of uncertainty, and potentially increase the number of publishable county estimates. The final report titled Improving Crop Estimates by Integrating Multiple Data Sources was released in 2017. This paper discusses several needs and requirements for NASS county-level crop estimates that were illuminated during the activities of the CNSTAT panel. A motivating example of planted acreage estimation in Illinois illustrates several challenges that NASS faces as it considers adopting any explicit model for official crops county estimates. Full Article
ma Comment: Empirical Bayes Interval Estimation By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Wenhua Jiang. Source: Statistical Science, Volume 34, Number 2, 219--223.Abstract: This is a contribution to the discussion of the enlightening paper by Professor Efron. We focus on empirical Bayes interval estimation. We discuss the oracle interval estimation rules, the empirical Bayes estimation of the oracle rule and the computation. Some numerical results are reported. Full Article
ma Comment: Minimalist $g$-Modeling By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Roger Koenker, Jiaying Gu. Source: Statistical Science, Volume 34, Number 2, 209--213.Abstract: Efron’s elegant approach to $g$-modeling for empirical Bayes problems is contrasted with an implementation of the Kiefer–Wolfowitz nonparametric maximum likelihood estimator for mixture models for several examples. The latter approach has the advantage that it is free of tuning parameters and consequently provides a relatively simple complementary method. Full Article
ma Gaussian Integrals and Rice Series in Crossing Distributions—to Compute the Distribution of Maxima and Other Features of Gaussian Processes By projecteuclid.org Published On :: Fri, 12 Apr 2019 04:00 EDT Georg Lindgren. Source: Statistical Science, Volume 34, Number 1, 100--128.Abstract: We describe and compare how methods based on the classical Rice’s formula for the expected number, and higher moments, of level crossings by a Gaussian process stand up to contemporary numerical methods to accurately deal with crossing related characteristics of the sample paths. We illustrate the relative merits in accuracy and computing time of the Rice moment methods and the exact numerical method, developed since the late 1990s, on three groups of distribution problems, the maximum over a finite interval and the waiting time to first crossing, the length of excursions over a level, and the joint period/amplitude of oscillations. We also treat the notoriously difficult problem of dependence between successive zero crossing distances. The exact solution has been known since at least 2000, but it has remained largely unnoticed outside the ocean science community. Extensive simulation studies illustrate the accuracy of the numerical methods. As a historical introduction an attempt is made to illustrate the relation between Rice’s original formulation and arguments and the exact numerical methods. Full Article
ma Comment: Contributions of Model Features to BART Causal Inference Performance Using ACIC 2016 Competition Data By projecteuclid.org Published On :: Fri, 12 Apr 2019 04:00 EDT Nicole Bohme Carnegie. Source: Statistical Science, Volume 34, Number 1, 90--93.Abstract: With a thorough exposition of the methods and results of the 2016 Atlantic Causal Inference Competition, Dorie et al. have set a new standard for reproducibility and comparability of evaluations of causal inference methods. In particular, the open-source R package aciccomp2016, which permits reproduction of all datasets used in the competition, will be an invaluable resource for evaluation of future methodological developments. Building upon results from Dorie et al., we examine whether a set of potential modifications to Bayesian Additive Regression Trees (BART)—multiple chains in model fitting, using the propensity score as a covariate, targeted maximum likelihood estimation (TMLE), and computing symmetric confidence intervals—have a stronger impact on bias, RMSE, and confidence interval coverage in combination than they do alone. We find that bias in the estimate of SATT is minimal, regardless of the BART formulation. For purposes of CI coverage, however, all proposed modifications are beneficial—alone and in combination—but use of TMLE is least beneficial for coverage and results in considerably wider confidence intervals. Full Article
ma Comment on “Automated Versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition” By projecteuclid.org Published On :: Fri, 12 Apr 2019 04:00 EDT Susan Gruber, Mark J. van der Laan. Source: Statistical Science, Volume 34, Number 1, 82--85.Abstract: Dorie and co-authors (DHSSC) are to be congratulated for initiating the ACIC Data Challenge. Their project engaged the community and accelerated research by providing a level playing field for comparing the performance of a priori specified algorithms. DHSSC identified themes concerning characteristics of the DGP, properties of the estimators, and inference. We discuss these themes in the context of targeted learning. Full Article
ma Matching Methods for Causal Inference: A Review and a Look Forward By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Elizabeth A. StuartSource: Statist. Sci., Volume 25, Number 1, 1--21.Abstract: When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970s, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods—or developing methods related to matching—do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed. Full Article
ma Smart women don't smoke / Biman Mullick. By search.wellcomelibrary.org Published On :: London (33 Stillness Road, London SE23 1NG) : Cleanair, Campaign for a Smoke-free Environment, [1989?] Full Article
ma We thank you for not smoking / design : Biman Mullick. By search.wellcomelibrary.org Published On :: London (33 Stillness Rd, London, SE23 1NG) : Cleanair, Campaign for a Smoke-free Environment, [198-?] Full Article
ma 'Smoke gets in your eyes' / Biman Mullick. By search.wellcomelibrary.org Published On :: London (33 Stllness Rd, London, SE23 1NG) : Cleanair, Campaign for a Smoke-free Environment, [198-?] Full Article
ma 'Smoking is slow-motion suicide' / Biman Mullick. By search.wellcomelibrary.org Published On :: London (33 Stillness Rd, London, SE23 ING) : Cleanair, Campaign for a Smoke-free Environment, [198-?] Full Article
ma Smoking affects us all. / Biman Mullick. By search.wellcomelibrary.org Published On :: London (33 Stillness Rd, London, SE23 1NG) : Cleanair, Campaign for a Smoke-free Environment, [198-?] Full Article
ma If you must smoke don't exhale / design : Biman Mullick. By search.wellcomelibrary.org Published On :: London (33 Stillness Rd, London, SE23 1NG) : Cleanair, Campaign for a Smoke-free Environment, [198-?] Full Article
ma Passive smoking kills / Biman Mullick. By search.wellcomelibrary.org Published On :: London : Cleanair, Smoke-free Environment (33 Stillness Rd, London, SE23 1NG), [198-?] Full Article
ma Be nice to yourself and others / design : Biman Mullick. By search.wellcomelibrary.org Published On :: London : Cleanair, Smoke-free Environment (33 Stillness Rd, London, SE23 1NG), [198-?] Full Article
ma Pollution / Biman Mullick. By search.wellcomelibrary.org Published On :: London : Cleanair, Smoke-free Environment (33 Stillness Rd, London, SE23 1NG), [198-?] Full Article
ma Cleanair not smoke / design : Biman Mullick. By search.wellcomelibrary.org Published On :: London : Cleanair, Smoke-free Environment (33 Stillness Rd, London, SE23 1NG), [198-?] Full Article
ma No smoking no hate / Biman Mullick. By search.wellcomelibrary.org Published On :: London : Cleanair, Smoke-free Environment (33 Stillness Rd, London, SE23 1NG), [198-?] Full Article