ng Bayesian modeling and prior sensitivity analysis for zero–one augmented beta regression models with an application to psychometric data By projecteuclid.org Published On :: Mon, 04 May 2020 04:00 EDT Danilo Covaes Nogarotto, Caio Lucidius Naberezny Azevedo, Jorge Luis Bazán. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 304--322.Abstract: The interest on the analysis of the zero–one augmented beta regression (ZOABR) model has been increasing over the last few years. In this work, we developed a Bayesian inference for the ZOABR model, providing some contributions, namely: we explored the use of Jeffreys-rule and independence Jeffreys prior for some of the parameters, performing a sensitivity study of prior choice, comparing the Bayesian estimates with the maximum likelihood ones and measuring the accuracy of the estimates under several scenarios of interest. The results indicate, in a general way, that: the Bayesian approach, under the Jeffreys-rule prior, was as accurate as the ML one. Also, different from other approaches, we use the predictive distribution of the response to implement Bayesian residuals. To further illustrate the advantages of our approach, we conduct an analysis of a real psychometric data set including a Bayesian residual analysis, where it is shown that misleading inference can be obtained when the data is transformed. That is, when the zeros and ones are transformed to suitable values and the usual beta regression model is considered, instead of the ZOABR model. Finally, future developments are discussed. Full Article
ng Random environment binomial thinning integer-valued autoregressive process with Poisson or geometric marginal By projecteuclid.org Published On :: Mon, 04 May 2020 04:00 EDT Zhengwei Liu, Qi Li, Fukang Zhu. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 251--272.Abstract: To predict time series of counts with small values and remarkable fluctuations, an available model is the $r$ states random environment process based on the negative binomial thinning operator and the geometric marginal. However, we argue that the aforementioned model may suffer from the following two drawbacks. First, under the condition of no prior information, the overdispersed property of the geometric distribution may cause the predictions fluctuate greatly. Second, because of the constraints on the model parameters, some estimated parameters are close to zero in real-data examples, which may not objectively reveal the correlation relationship. For the first drawback, an $r$ states random environment process based on the binomial thinning operator and the Poisson marginal is introduced. For the second drawback, we propose a generalized $r$ states random environment integer-valued autoregressive model based on the binomial thinning operator to model fluctuations of data. Yule–Walker and conditional maximum likelihood estimates are considered and their performances are assessed via simulation studies. Two real-data sets are conducted to illustrate the better performances of the proposed models compared with some existing models. Full Article
ng On estimating the location parameter of the selected exponential population under the LINEX loss function By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Mohd Arshad, Omer Abdalghani. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 167--182.Abstract: Suppose that $pi_{1},pi_{2},ldots ,pi_{k}$ be $k(geq2)$ independent exponential populations having unknown location parameters $mu_{1},mu_{2},ldots,mu_{k}$ and known scale parameters $sigma_{1},ldots,sigma_{k}$. Let $mu_{[k]}=max {mu_{1},ldots,mu_{k}}$. For selecting the population associated with $mu_{[k]}$, a class of selection rules (proposed by Arshad and Misra [ Statistical Papers 57 (2016) 605–621]) is considered. We consider the problem of estimating the location parameter $mu_{S}$ of the selected population under the criterion of the LINEX loss function. We consider three natural estimators $delta_{N,1},delta_{N,2}$ and $delta_{N,3}$ of $mu_{S}$, based on the maximum likelihood estimators, uniformly minimum variance unbiased estimator (UMVUE) and minimum risk equivariant estimator (MREE) of $mu_{i}$’s, respectively. The uniformly minimum risk unbiased estimator (UMRUE) and the generalized Bayes estimator of $mu_{S}$ are derived. Under the LINEX loss function, a general result for improving a location-equivariant estimator of $mu_{S}$ is derived. Using this result, estimator better than the natural estimator $delta_{N,1}$ is obtained. We also shown that the estimator $delta_{N,1}$ is dominated by the natural estimator $delta_{N,3}$. Finally, we perform a simulation study to evaluate and compare risk functions among various competing estimators of $mu_{S}$. Full Article
ng A primer on the characterization of the exchangeable Marshall–Olkin copula via monotone sequences By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Natalia Shenkman. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 127--135.Abstract: While derivations of the characterization of the $d$-variate exchangeable Marshall–Olkin copula via $d$-monotone sequences relying on basic knowledge in probability theory exist in the literature, they contain a myriad of unnecessary relatively complicated computations. We revisit this issue and provide proofs where all undesired artefacts are removed, thereby exposing the simplicity of the characterization. In particular, we give an insightful analytical derivation of the monotonicity conditions based on the monotonicity properties of the survival probabilities. Full Article
ng Robust Bayesian model selection for heavy-tailed linear regression using finite mixtures By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Flávio B. Gonçalves, Marcos O. Prates, Victor Hugo Lachos. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 51--70.Abstract: In this paper, we present a novel methodology to perform Bayesian model selection in linear models with heavy-tailed distributions. We consider a finite mixture of distributions to model a latent variable where each component of the mixture corresponds to one possible model within the symmetrical class of normal independent distributions. Naturally, the Gaussian model is one of the possibilities. This allows for a simultaneous analysis based on the posterior probability of each model. Inference is performed via Markov chain Monte Carlo—a Gibbs sampler with Metropolis–Hastings steps for a class of parameters. Simulated examples highlight the advantages of this approach compared to a segregated analysis based on arbitrarily chosen model selection criteria. Examples with real data are presented and an extension to censored linear regression is introduced and discussed. Full Article
ng A joint mean-correlation modeling approach for longitudinal zero-inflated count data By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Weiping Zhang, Jiangli Wang, Fang Qian, Yu Chen. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 35--50.Abstract: Longitudinal zero-inflated count data are widely encountered in many fields, while modeling the correlation between measurements for the same subject is more challenge due to the lack of suitable multivariate joint distributions. This paper studies a novel mean-correlation modeling approach for longitudinal zero-inflated regression model, solving both problems of specifying joint distribution and parsimoniously modeling correlations with no constraint. The joint distribution of zero-inflated discrete longitudinal responses is modeled by a copula model whose correlation parameters are innovatively represented in hyper-spherical coordinates. To overcome the computational intractability in maximizing the full likelihood function of the model, we further propose a computationally efficient pairwise likelihood approach. We then propose separated mean and correlation regression models to model these key quantities, such modeling approach can also handle irregularly and possibly subject-specific times points. The resulting estimators are shown to be consistent and asymptotically normal. Data example and simulations support the effectiveness of the proposed approach. Full Article
ng Bootstrap-based testing inference in beta regressions By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Fábio P. Lima, Francisco Cribari-Neto. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 18--34.Abstract: We address the issue of performing testing inference in small samples in the class of beta regression models. We consider the likelihood ratio test and its standard bootstrap version. We also consider two alternative resampling-based tests. One of them uses the bootstrap test statistic replicates to numerically estimate a Bartlett correction factor that can be applied to the likelihood ratio test statistic. By doing so, we avoid estimation of quantities located in the tail of the likelihood ratio test statistic null distribution. The second alternative resampling-based test uses a fast double bootstrap scheme in which a single second level bootstrapping resample is performed for each first level bootstrap replication. It delivers accurate testing inferences at a computational cost that is considerably smaller than that of a standard double bootstrapping scheme. The Monte Carlo results we provide show that the standard likelihood ratio test tends to be quite liberal in small samples. They also show that the bootstrap tests deliver accurate testing inferences even when the sample size is quite small. An empirical application is also presented and discussed. Full Article
ng Subjective Bayesian testing using calibrated prior probabilities By projecteuclid.org Published On :: Mon, 26 Aug 2019 04:00 EDT Dan J. Spitzner. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 861--893.Abstract: This article proposes a calibration scheme for Bayesian testing that coordinates analytically-derived statistical performance considerations with expert opinion. In other words, the scheme is effective and meaningful for incorporating objective elements into subjective Bayesian inference. It explores a novel role for default priors as anchors for calibration rather than substitutes for prior knowledge. Ideas are developed for use with multiplicity adjustments in multiple-model contexts, and to address the issue of prior sensitivity of Bayes factors. Along the way, the performance properties of an existing multiplicity adjustment related to the Poisson distribution are clarified theoretically. Connections of the overall calibration scheme to the Schwarz criterion are also explored. The proposed framework is examined and illustrated on a number of existing data sets related to problems in clinical trials, forensic pattern matching, and log-linear models methodology. Full Article
ng Option pricing with bivariate risk-neutral density via copula and heteroscedastic model: A Bayesian approach By projecteuclid.org Published On :: Mon, 26 Aug 2019 04:00 EDT Lucas Pereira Lopes, Vicente Garibay Cancho, Francisco Louzada. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 801--825.Abstract: Multivariate options are adequate tools for multi-asset risk management. The pricing models derived from the pioneer Black and Scholes method under the multivariate case consider that the asset-object prices follow a Brownian geometric motion. However, the construction of such methods imposes some unrealistic constraints on the process of fair option calculation, such as constant volatility over the maturity time and linear correlation between the assets. Therefore, this paper aims to price and analyze the fair price behavior of the call-on-max (bivariate) option considering marginal heteroscedastic models with dependence structure modeled via copulas. Concerning inference, we adopt a Bayesian perspective and computationally intensive methods based on Monte Carlo simulations via Markov Chain (MCMC). A simulation study examines the bias, and the root mean squared errors of the posterior means for the parameters. Real stocks prices of Brazilian banks illustrate the approach. For the proposed method is verified the effects of strike and dependence structure on the fair price of the option. The results show that the prices obtained by our heteroscedastic model approach and copulas differ substantially from the prices obtained by the model derived from Black and Scholes. Empirical results are presented to argue the advantages of our strategy. Full Article
ng Bayesian modelling of the abilities in dichotomous IRT models via regression with missing values in the covariates By projecteuclid.org Published On :: Mon, 26 Aug 2019 04:00 EDT Flávio B. Gonçalves, Bárbara C. C. Dias. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 782--800.Abstract: Educational assessment usually considers a contextual questionnaire to extract relevant information from the applicants. This may include items related to socio-economical profile as well as items to extract other characteristics potentially related to applicant’s performance in the test. A careful analysis of the questionnaires jointly with the test’s results may evidence important relations between profiles and test performance. The most coherent way to perform this task in a statistical context is to use the information from the questionnaire to help explain the variability of the abilities in a joint model-based approach. Nevertheless, the responses to the questionnaire typically present missing values which, in some cases, may be missing not at random. This paper proposes a statistical methodology to model the abilities in dichotomous IRT models using the information of the contextual questionnaires via linear regression. The proposed methodology models the missing data jointly with the all the observed data, which allows for the estimation of the former. The missing data modelling is flexible enough to allow the specification of missing not at random structures. Furthermore, even if those structures are not assumed a priori, they can be estimated from the posterior results when assuming missing (completely) at random structures a priori. Statistical inference is performed under the Bayesian paradigm via an efficient MCMC algorithm. Simulated and real examples are presented to investigate the efficiency and applicability of the proposed methodology. Full Article
ng Bayesian hypothesis testing: Redux By projecteuclid.org Published On :: Mon, 26 Aug 2019 04:00 EDT Hedibert F. Lopes, Nicholas G. Polson. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 745--755.Abstract: Bayesian hypothesis testing is re-examined from the perspective of an a priori assessment of the test statistic distribution under the alternative. By assessing the distribution of an observable test statistic, rather than prior parameter values, we revisit the seminal paper of Edwards, Lindman and Savage ( Psychol. Rev. 70 (1963) 193–242). There are a number of important take-aways from comparing the Bayesian paradigm via Bayes factors to frequentist ones. We provide examples where evidence for a Bayesian strikingly supports the null, but leads to rejection under a classical test. Finally, we conclude with directions for future research. Full Article
ng The limiting distribution of the Gibbs sampler for the intrinsic conditional autoregressive model By projecteuclid.org Published On :: Mon, 26 Aug 2019 04:00 EDT Marco A. R. Ferreira. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 734--744.Abstract: We study the limiting behavior of the one-at-a-time Gibbs sampler for the intrinsic conditional autoregressive model with centering on the fly. The intrinsic conditional autoregressive model is widely used as a prior for random effects in hierarchical models for spatial modeling. This model is defined by full conditional distributions that imply an improper joint “density” with a multivariate Gaussian kernel and a singular precision matrix. To guarantee propriety of the posterior distribution, usually at the end of each iteration of the Gibbs sampler the random effects are centered to sum to zero in what is widely known as centering on the fly. While this works well in practice, this informal computational way to recenter the random effects obscures their implied prior distribution and prevents the development of formal Bayesian procedures. Here we show that the implied prior distribution, that is, the limiting distribution of the one-at-a-time Gibbs sampler for the intrinsic conditional autoregressive model with centering on the fly is a singular Gaussian distribution with a covariance matrix that is the Moore–Penrose inverse of the precision matrix. This result has important implications for the development of formal Bayesian procedures such as reference priors and Bayes-factor-based model selection for spatial models. Full Article
ng Keeping the balance—Bridge sampling for marginal likelihood estimation in finite mixture, mixture of experts and Markov mixture models By projecteuclid.org Published On :: Mon, 26 Aug 2019 04:00 EDT Sylvia Frühwirth-Schnatter. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 706--733.Abstract: Finite mixture models and their extensions to Markov mixture and mixture of experts models are very popular in analysing data of various kind. A challenge for these models is choosing the number of components based on marginal likelihoods. The present paper suggests two innovative, generic bridge sampling estimators of the marginal likelihood that are based on constructing balanced importance densities from the conditional densities arising during Gibbs sampling. The full permutation bridge sampling estimator is derived from considering all possible permutations of the mixture labels for a subset of these densities. For the double random permutation bridge sampling estimator, two levels of random permutations are applied, first to permute the labels of the MCMC draws and second to randomly permute the labels of the conditional densities arising during Gibbs sampling. Various applications show very good performance of these estimators in comparison to importance and to reciprocal importance sampling estimators derived from the same importance densities. Full Article
ng Influence measures for the Waring regression model By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Luisa Rivas, Manuel Galea. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 402--424.Abstract: In this paper, we present a regression model where the response variable is a count data that follows a Waring distribution. The Waring regression model allows for analysis of phenomena where the Geometric regression model is inadequate, because the probability of success on each trial, $p$, is different for each individual and $p$ has an associated distribution. Estimation is performed by maximum likelihood, through the maximization of the $Q$-function using EM algorithm. Diagnostic measures are calculated for this model. To illustrate the results, an application to real data is presented. Some specific details are given in the Appendix of the paper. Full Article
ng Hierarchical modelling of power law processes for the analysis of repairable systems with different truncation times: An empirical Bayes approach By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Rodrigo Citton P. dos Reis, Enrico A. Colosimo, Gustavo L. Gilardoni. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 374--396.Abstract: In the data analysis from multiple repairable systems, it is usual to observe both different truncation times and heterogeneity among the systems. Among other reasons, the latter is caused by different manufacturing lines and maintenance teams of the systems. In this paper, a hierarchical model is proposed for the statistical analysis of multiple repairable systems under different truncation times. A reparameterization of the power law process is proposed in order to obtain a quasi-conjugate bayesian analysis. An empirical Bayes approach is used to estimate model hyperparameters. The uncertainty in the estimate of these quantities are corrected by using a parametric bootstrap approach. The results are illustrated in a real data set of failure times of power transformers from an electric company in Brazil. Full Article
ng Necessary and sufficient conditions for the convergence of the consistent maximal displacement of the branching random walk By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Bastien Mallein. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 356--373.Abstract: Consider a supercritical branching random walk on the real line. The consistent maximal displacement is the smallest of the distances between the trajectories followed by individuals at the $n$th generation and the boundary of the process. Fang and Zeitouni, and Faraud, Hu and Shi proved that under some integrability conditions, the consistent maximal displacement grows almost surely at rate $lambda^{*}n^{1/3}$ for some explicit constant $lambda^{*}$. We obtain here a necessary and sufficient condition for this asymptotic behaviour to hold. Full Article
ng Failure rate of Birnbaum–Saunders distributions: Shape, change-point, estimation and robustness By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Emilia Athayde, Assis Azevedo, Michelli Barros, Víctor Leiva. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 301--328.Abstract: The Birnbaum–Saunders (BS) distribution has been largely studied and applied. A random variable with BS distribution is a transformation of another random variable with standard normal distribution. Generalized BS distributions are obtained when the normally distributed random variable is replaced by another symmetrically distributed random variable. This allows us to obtain a wide class of positively skewed models with lighter and heavier tails than the BS model. Its failure rate admits several shapes, including the unimodal case, with its change-point being able to be used for different purposes. For example, to establish the reduction in a dose, and then in the cost of the medical treatment. We analyze the failure rates of generalized BS distributions obtained by the logistic, normal and Student-t distributions, considering their shape and change-point, estimating them, evaluating their robustness, assessing their performance by simulations, and applying the results to real data from different areas. Full Article
ng Modified information criterion for testing changes in skew normal model By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Khamis K. Said, Wei Ning, Yubin Tian. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 280--300.Abstract: In this paper, we study the change point problem for the skew normal distribution model from the view of model selection problem. The detection procedure based on the modified information criterion (MIC) for change problem is proposed. Such a procedure has advantage in detecting the changes in early and late stage of a data comparing to the one based on the traditional Schwarz information criterion which is well known as Bayesian information criterion (BIC) by considering the complexity of the models. Due to the difficulty in deriving the analytic asymptotic distribution of the test statistic based on the MIC procedure, the bootstrap simulation is provided to obtain the critical values at the different significance levels. Simulations are conducted to illustrate the comparisons of performance between MIC, BIC and likelihood ratio test (LRT). Such an approach is applied on two stock market data sets to indicate the detection procedure. Full Article
ng A brief review of optimal scaling of the main MCMC approaches and optimal scaling of additive TMCMC under non-regular cases By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Kushal K. Dey, Sourabh Bhattacharya. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 222--266.Abstract: Transformation based Markov Chain Monte Carlo (TMCMC) was proposed by Dutta and Bhattacharya ( Statistical Methodology 16 (2014) 100–116) as an efficient alternative to the Metropolis–Hastings algorithm, especially in high dimensions. The main advantage of this algorithm is that it simultaneously updates all components of a high dimensional parameter using appropriate move types defined by deterministic transformation of a single random variable. This results in reduction in time complexity at each step of the chain and enhances the acceptance rate. In this paper, we first provide a brief review of the optimal scaling theory for various existing MCMC approaches, comparing and contrasting them with the corresponding TMCMC approaches.The optimal scaling of the simplest form of TMCMC, namely additive TMCMC , has been studied extensively for the Gaussian proposal density in Dey and Bhattacharya (2017a). Here, we discuss diffusion-based optimal scaling behavior of additive TMCMC for non-Gaussian proposal densities—in particular, uniform, Student’s $t$ and Cauchy proposals. Although we could not formally prove our diffusion result for the Cauchy proposal, simulation based results lead us to conjecture that at least the recipe for obtaining general optimal scaling and optimal acceptance rate holds for the Cauchy case as well. We also consider diffusion based optimal scaling of TMCMC when the target density is discontinuous. Such non-regular situations have been studied in the case of Random Walk Metropolis Hastings (RWMH) algorithm by Neal and Roberts ( Methodology and Computing in Applied Probability 13 (2011) 583–601) using expected squared jumping distance (ESJD), but the diffusion theory based scaling has not been considered. We compare our diffusion based optimally scaled TMCMC approach with the ESJD based optimally scaled RWM with simulation studies involving several target distributions and proposal distributions including the challenging Cauchy proposal case, showing that additive TMCMC outperforms RWMH in almost all cases considered. Full Article
ng Simple tail index estimation for dependent and heterogeneous data with missing values By projecteuclid.org Published On :: Mon, 14 Jan 2019 04:01 EST Ivana Ilić, Vladica M. Veličković. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 1, 192--203.Abstract: Financial returns are known to be nonnormal and tend to have fat-tailed distribution. Also, the dependence of large values in a stochastic process is an important topic in risk, insurance and finance. In the presence of missing values, we deal with the asymptotic properties of a simple “median” estimator of the tail index based on random variables with the heavy-tailed distribution function and certain dependence among the extremes. Weak consistency and asymptotic normality of the proposed estimator are established. The estimator is a special case of a well-known estimator defined in Bacro and Brito [ Statistics & Decisions 3 (1993) 133–143]. The advantage of the estimator is its robustness against deviations and compared to Hill’s, it is less affected by the fluctuations related to the maximum of the sample or by the presence of outliers. Several examples are analyzed in order to support the proofs. Full Article
ng NDN coping mechanisms : notes from the field By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Belcourt, Billy-Ray, author.Callnumber: PS 8603 E516 N46 2019ISBN: 9781487005771 (softcover) Full Article
ng Any waking morning By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Soutar-Hynes, Mary Lou, author.Callnumber: PS 8587 O9756 A85 2019ISBN: 9781771336413 (softcover) Full Article
ng Reclaiming indigenous governance : reflections and insights from Australia, Canada, New Zealand, and the United States By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Callnumber: K 3247 R43 2019ISBN: 9780816539970 (paperback) Full Article
ng Globalizing capital : a history of the international monetary system By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Eichengreen, Barry J., author.Callnumber: HG 3881 E347 2019ISBN: 9780691193908 (paperback) Full Article
ng Documenting rebellions : a study of four lesbian and gay archives in queer times By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Sheffield, Rebecka Taves, author.Callnumber: CD 3021 S45 2020ISBN: 9781634000918 paperback Full Article
ng Figuring racism in medieval Christianity By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Kaplan, M. Lindsay, author.Callnumber: BT 734.2 K354 2019ISBN: 9780190678241 hardcover alkaline paper Full Article
ng Can $p$-values be meaningfully interpreted without random sampling? By projecteuclid.org Published On :: Thu, 26 Mar 2020 22:02 EDT Norbert Hirschauer, Sven Grüner, Oliver Mußhoff, Claudia Becker, Antje Jantsch. Source: Statistics Surveys, Volume 14, 71--91.Abstract: Besides the inferential errors that abound in the interpretation of $p$-values, the probabilistic pre-conditions (i.e. random sampling or equivalent) for using them at all are not often met by observational studies in the social sciences. This paper systematizes different sampling designs and discusses the restrictive requirements of data collection that are the indispensable prerequisite for using $p$-values. Full Article
ng Estimating the size of a hidden finite set: Large-sample behavior of estimators By projecteuclid.org Published On :: Fri, 03 Jan 2020 22:02 EST Si Cheng, Daniel J. Eck, Forrest W. Crawford. Source: Statistics Surveys, Volume 14, 1--31.Abstract: A finite set is “hidden” if its elements are not directly enumerable or if its size cannot be ascertained via a deterministic query. In public health, epidemiology, demography, ecology and intelligence analysis, researchers have developed a wide variety of indirect statistical approaches, under different models for sampling and observation, for estimating the size of a hidden set. Some methods make use of random sampling with known or estimable sampling probabilities, and others make structural assumptions about relationships (e.g. ordering or network information) between the elements that comprise the hidden set. In this review, we describe models and methods for learning about the size of a hidden finite set, with special attention to asymptotic properties of estimators. We study the properties of these methods under two asymptotic regimes, “infill” in which the number of fixed-size samples increases, but the population size remains constant, and “outfill” in which the sample size and population size grow together. Statistical properties under these two regimes can be dramatically different. Full Article
ng Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists By projecteuclid.org Published On :: Tue, 05 Nov 2019 22:03 EST John J. Dziak, Donna L. Coffman, Matthew Reimherr, Justin Petrovich, Runze Li, Saul Shiffman, Mariya P. Shiyko. Source: Statistics Surveys, Volume 13, 150--180.Abstract: Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study. Full Article
ng Halfspace depth and floating body By projecteuclid.org Published On :: Fri, 21 Jun 2019 22:03 EDT Stanislav Nagy, Carsten Schütt, Elisabeth M. Werner. Source: Statistics Surveys, Volume 13, 52--118.Abstract: Little known relations of the renown concept of the halfspace depth for multivariate data with notions from convex and affine geometry are discussed. Maximum halfspace depth may be regarded as a measure of symmetry for random vectors. As such, the maximum depth stands as a generalization of a measure of symmetry for convex sets, well studied in geometry. Under a mild assumption, the upper level sets of the halfspace depth coincide with the convex floating bodies of measures used in the definition of the affine surface area for convex bodies in Euclidean spaces. These connections enable us to partially resolve some persistent open problems regarding theoretical properties of the depth. Full Article
ng Pitfalls of significance testing and $p$-value variability: An econometrics perspective By projecteuclid.org Published On :: Wed, 03 Oct 2018 22:00 EDT Norbert Hirschauer, Sven Grüner, Oliver Mußhoff, Claudia Becker. Source: Statistics Surveys, Volume 12, 136--172.Abstract: Data on how many scientific findings are reproducible are generally bleak and a wealth of papers have warned against misuses of the $p$-value and resulting false findings in recent years. This paper discusses the question of what we can(not) learn from the $p$-value, which is still widely considered as the gold standard of statistical validity. We aim to provide a non-technical and easily accessible resource for statistical practitioners who wish to spot and avoid misinterpretations and misuses of statistical significance tests. For this purpose, we first classify and describe the most widely discussed (“classical”) pitfalls of significance testing, and review published work on these misuses with a focus on regression-based “confirmatory” study. This includes a description of the single-study bias and a simulation-based illustration of how proper meta-analysis compares to misleading significance counts (“vote counting”). Going beyond the classical pitfalls, we also use simulation to provide intuition that relying on the statistical estimate “$p$-value” as a measure of evidence without considering its sample-to-sample variability falls short of the mark even within an otherwise appropriate interpretation. We conclude with a discussion of the exigencies of informed approaches to statistical inference and corresponding institutional reforms. Full Article
ng Variable selection methods for model-based clustering By projecteuclid.org Published On :: Thu, 26 Apr 2018 04:00 EDT Michael Fop, Thomas Brendan Murphy. Source: Statistics Surveys, Volume 12, 18--65.Abstract: Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples. Full Article
ng A design-sensitive approach to fitting regression models with complex survey data By projecteuclid.org Published On :: Wed, 17 Jan 2018 04:00 EST Phillip S. Kott. Source: Statistics Surveys, Volume 12, 1--17.Abstract: Fitting complex survey data to regression equations is explored under a design-sensitive model-based framework. A robust version of the standard model assumes that the expected value of the difference between the dependent variable and its model-based prediction is zero no matter what the values of the explanatory variables. The extended model assumes only that the difference is uncorrelated with the covariates. Little is assumed about the error structure of this difference under either model other than independence across primary sampling units. The standard model often fails in practice, but the extended model very rarely does. Under this framework some of the methods developed in the conventional design-based, pseudo-maximum-likelihood framework, such as fitting weighted estimating equations and sandwich mean-squared-error estimation, are retained but their interpretations change. Few of the ideas here are new to the refereed literature. The goal instead is to collect those ideas and put them into a unified conceptual framework. Full Article
ng Measuring multivariate association and beyond By projecteuclid.org Published On :: Wed, 16 Nov 2016 22:00 EST Julie Josse, Susan Holmes. Source: Statistics Surveys, Volume 10, 132--167.Abstract: Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association’s underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research. Full Article
ng A survey of bootstrap methods in finite population sampling By projecteuclid.org Published On :: Tue, 15 Mar 2016 09:17 EDT Zeinab Mashreghi, David Haziza, Christian Léger. Source: Statistics Surveys, Volume 10, 1--52.Abstract: We review bootstrap methods in the context of survey data where the effect of the sampling design on the variability of estimators has to be taken into account. We present the methods in a unified way by classifying them in three classes: pseudo-population, direct, and survey weights methods. We cover variance estimation and the construction of confidence intervals for stratified simple random sampling as well as some unequal probability sampling designs. We also address the problem of variance estimation in presence of imputation to compensate for item non-response. Full Article
ng Log-concavity and strong log-concavity: A review By projecteuclid.org Published On :: Tue, 09 Dec 2014 09:09 EST Adrien Saumard, Jon A. Wellner. Source: Statistics Surveys, Volume 8, 45--114.Abstract: We review and formulate results concerning log-concavity and strong-log-concavity in both discrete and continuous settings. We show how preservation of log-concavity and strong log-concavity on $mathbb{R}$ under convolution follows from a fundamental monotonicity result of Efron (1965). We provide a new proof of Efron’s theorem using the recent asymmetric Brascamp-Lieb inequality due to Otto and Menz (2013). Along the way we review connections between log-concavity and other areas of mathematics and statistics, including concentration of measure, log-Sobolev inequalities, convex geometry, MCMC algorithms, Laplace approximations, and machine learning. Full Article
ng Analyzing complex functional brain networks: Fusing statistics and network science to understand the brain By projecteuclid.org Published On :: Mon, 28 Oct 2013 09:06 EDT Sean L. Simpson, F. DuBois Bowman, Paul J. LaurientiSource: Statist. Surv., Volume 7, 1--36.Abstract: Complex functional brain network analyses have exploded over the last decade, gaining traction due to their profound clinical implications. The application of network science (an interdisciplinary offshoot of graph theory) has facilitated these analyses and enabled examining the brain as an integrated system that produces complex behaviors. While the field of statistics has been integral in advancing activation analyses and some connectivity analyses in functional neuroimaging research, it has yet to play a commensurate role in complex network analyses. Fusing novel statistical methods with network-based functional neuroimage analysis will engender powerful analytical tools that will aid in our understanding of normal brain function as well as alterations due to various brain disorders. Here we survey widely used statistical and network science tools for analyzing fMRI network data and discuss the challenges faced in filling some of the remaining methodological gaps. When applied and interpreted correctly, the fusion of network scientific and statistical methods has a chance to revolutionize the understanding of brain function. Full Article
ng The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy By projecteuclid.org Published On :: Tue, 16 Oct 2012 09:36 EDT Nancy HeckmanSource: Statist. Surv., Volume 6, 113--141.Abstract: The popular cubic smoothing spline estimate of a regression function arises as the minimizer of the penalized sum of squares $sum_{j}(Y_{j}-mu(t_{j}))^{2}+lambda int_{a}^{b}[mu''(t)]^{2},dt$, where the data are $t_{j},Y_{j}$, $j=1,ldots,n$. The minimization is taken over an infinite-dimensional function space, the space of all functions with square integrable second derivatives. But the calculations can be carried out in a finite-dimensional space. The reduction from minimizing over an infinite dimensional space to minimizing over a finite dimensional space occurs for more general objective functions: the data may be related to the function $mu$ in another way, the sum of squares may be replaced by a more suitable expression, or the penalty, $int_{a}^{b}[mu''(t)]^{2},dt$, might take a different form. This paper reviews the Reproducing Kernel Hilbert Space structure that provides a finite-dimensional solution for a general minimization problem. Particular attention is paid to the construction and study of the Reproducing Kernel Hilbert Space corresponding to a penalty based on a linear differential operator. In this case, one can often calculate the minimizer explicitly, using Green’s functions. Full Article
ng Statistical inference for disordered sphere packings By projecteuclid.org Published On :: Thu, 19 Jul 2012 08:37 EDT Jeffrey PickaSource: Statist. Surv., Volume 6, 74--112.Abstract: This paper gives an overview of statistical inference for disordered sphere packing processes. These processes are used extensively in physics and engineering in order to represent the internal structure of composite materials, packed bed reactors, and powders at rest, and are used as initial arrangements of grains in the study of avalanches and other problems involving powders in motion. Packing processes are spatial processes which are neither stationary nor ergodic. Classical spatial statistical models and procedures cannot be applied to these processes, but alternative models and procedures can be developed based on ideas from statistical physics. Most of the development of models and statistics for sphere packings has been undertaken by scientists and engineers. This review summarizes their results from an inferential perspective. Full Article
ng Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy By projecteuclid.org Published On :: Fri, 04 Feb 2011 09:16 EST Gregory J. Matthews, Ofer HarelSource: Statist. Surv., Volume 5, 1--29.Abstract: There is an ever increasing demand from researchers for access to useful microdata files. However, there are also growing concerns regarding the privacy of the individuals contained in the microdata. Ideally, microdata could be released in such a way that a balance between usefulness of the data and privacy is struck. This paper presents a review of proposed methods of statistical disclosure control and techniques for assessing the privacy of such methods under different definitions of disclosure. References:Abowd, J., Woodcock, S., 2001. Disclosure limitation in longitudinal linked data. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, 215–277.Adam, N.R., Worthmann, J.C., 1989. Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21 (4), 515–556.Armstrong, M., Rushton, G., Zimmerman, D.L., 1999. Geographically masking health data to preserve confidentiality. Statistics in Medicine 18 (5), 497–525.Bethlehem, J.G., Keller, W., Pannekoek, J., 1990. Disclosure control of microdata. Jorunal of the American Statistical Association 85, 38–45.Blum, A., Dwork, C., McSherry, F., Nissam, K., 2005. Practical privacy: The sulq framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 128–138.Bowden, R.J., Sim, A.B., 1992. The privacy bootstrap. Journal of Business and Economic Statistics 10 (3), 337–345.Carlson, M., Salabasis, M., 2002. A data-swapping technique for generating synthetic samples; a method for disclosure control. Res. Official Statist. (5), 35–64.Cox, L.H., 1980. Suppression methodology and statistical disclosure control. Journal of the American Statistical Association 75, 377–385.Cox, L.H., 1984. Disclosure control methods for frequency count data. Tech. rep., U.S. Bureau of the Census.Cox, L.H., 1987. A constructive procedure for unbiased controlled rounding. Journal of the American Statistical Association 82, 520–524.Cox, L.H., 1994. Matrix masking methods for disclosure limitation in microdata. Survey Methodology 6, 165–169.Cox, L.H., Fagan, J.T., Greenberg, B., Hemmig, R., 1987. Disclosure avoidance techniques for tabular data. Tech. rep., U.S. Bureau of the Census.Dalenius, T., 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429–444.Dalenius, T., 1986. Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2 (3), 329–336.Dalenius, T., Denning, D., 1982. A hybrid scheme for release of statistics. Statistisk Tidskrift.Dalenius, T., Reiss, S.P., 1982. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85.De Waal, A., Hundepool, A., Willenborg, L., 1995. Argus: Software for statistical disclosure control of microdata. U.S. Census Bureau.DeGroot, M.H., 1962. Uncertainty, information, and sequential experiments. Annals of Mathematical Statistics 33, 404–419.DeGroot, M.H., 1970. Optimal Statistical Decisions. Mansell, London.Dinur, I., Nissam, K., 2003. Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems. pp. 202–210.Domingo-Ferrer, J., Torra, V., 2001a. A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 6, pp. 113–135.Domingo-Ferrer, J., Torra, V., 2001b. Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 5, pp. 93–112.Duncan, G., Lambert, D., 1986. Disclosure-limited data dissemination. Journal of the American Statistical Association 81, 10–28.Duncan, G., Lambert, D., 1989. The risk of disclosure for microdata. Journal of Business & Economic Statistics 7, 207–217. Duncan, G., Pearson, R., 1991. Enhancing access to microdata while protecting confidentiality: prospects for the future (with discussion). Statistical Science 6, 219–232.Dwork, C., 2006. Differential privacy. In: ICALP. Springer, pp. 1–12.Dwork, C., 2008. An ad omnia approach to defining and achieving private data analysis. In: Lecture Notes in Computer Science. Springer, p. 10.Dwork, C., Lei, J., 2009. Differential privacy and robust statistics. In: Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC). pp. 371–380.Dwork, C., Mcsherry, F., Nissim, K., Smith, A., 2006. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. Springer, pp. 265–284.Dwork, C., Nissam, K., 2004. Privacy-preserving datamining on vertically partitioned databases. In: Advances in Cryptology: Proceedings of Crypto. pp. 528–544.Elliot, M., 2000. DIS: a new approach to the measurement of statistical disclosure risk. International Journal of Risk Assessment and Management 2, 39–48.Federal Committee on Statistical Methodology (FCSM), 2005. Statistical policy working group 22 - report on statistical disclosure limitation methodology. U.S. Census Bureau.Fellegi, I.P., 1972. On the question of statistical confidentiality. Journal of the American Statistical Association 67 (337), 7–18.Fienberg, S.E., McIntyre, J., 2004. Data swapping: Variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (Eds.), Privacy in Statistical Databases. Vol. 3050 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, pp. 519, http://dx.doi.org/10.1007/ 978-3-540-25955-8_2Fuller, W., 1993. Masking procedurse for microdata disclosure limitation. Journal of Official Statistics 9, 383–406.General Assembly of the United Nations, 1948. Universal declaration of human rights.Gouweleeuw, J., P. Kooiman, L.W., de Wolf, P.-P., 1998. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14 (4), 463–478.Greenberg, B., 1987. Rank swapping for masking ordinal microdata. Tech. rep., U.S. Bureau of the Census (unpublished manuscript), Suitland, Maryland, USA.Greenberg, B.G., Abul-Ela, A.-L.A., Simmons, W.R., Horvitz, D.G., 1969. The unrelated question randomized response model: Theoretical framework. Journal of the American Statistical Association 64 (326), 520–539.Harel, O., Zhou, X.-H., 2007. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 26, 3057–3077. Hundepool, A., Domingo-ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Nordholt, E.S., Seri, G., paul De Wolf, P., 2006. A CENtre of EXcellence for Statistical Disclosure Control Handbook on Statistical Disclosure Control Version 1.01.Hundepool, A., Wetering, A. v.d., Ramaswamy, R., Wolf, P.d., Giessing, S., Fischetti, M., Salazar, J., Castro, J., Lowthian, P., Feb. 2005. τ-argus 3.1 user manual. Statistics Netherlands, Voorburg NL.Hundepool, A., Willenborg, L., 1996. μ- and τ-argus: Software for statistical disclosure control. Third International Seminar on Statistical Confidentiality, Bled.Karr, A., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P., 2006. A framework for evaluating the utility of data altered to protect confidentiality. American Statistician 60 (3), 224–232.Kaufman, S., Seastrom, M., Roey, S., 2005. Do disclosure controls to protect confidentiality degrade the quality of the data? In: American Statistical Association, Proceedings of the Section on Survey Research.Kennickell, A.B., 1997. Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. Record Linkage Techniques, 248–267.Kim, J., 1986. Limiting disclosure in microdata based on random noise and transformation. Bureau of the Census.Krumm, J., 2007. Inference attacks on location tracks. Proceedings of Fifth International Conference on Pervasive Computingy, 127–143.Li, N., Li, T., Venkatasubramanian, S., 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. pp. 106–115.Liew, C.K., Choi, U.J., Liew, C.J., 1985. A data distortion by probability distribution. ACM Trans. Database Syst. 10 (3), 395–411.Little, R.J.A., 1993. Statistical analysis of masked data. Journal of Official Statistics 9, 407–426.Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. John Wiley & Sons.Liu, F., Little, R.J.A., 2002. Selective multiple mputation of keys for statistical disclosure control in microdata. In: Proceedings Joint Statistical Meet. pp. 2133–2138.Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L., April 2008. Privacy: Theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Comuputer Science Department, Cornell, USA, p. 10.Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M., 2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1 (1), 3.Manning, A.M., Haglin, D.J., Keane, J.A., 2008. A recursive search algorithm for statistical disclosure assessment. Data Min. Knowl. Discov. 16 (2), 165–196. Marsh, C., Skinner, C., Arber, S., Penhale, B., Openshaw, S., Hobcraft, J., Lievesley, D., Walford, N., 1991. The case for samples of anonymized records from the 1991 census. Journal of the Royal Statistical Society 154 (2), 305–340.Matthews, G.J., Harel, O., Aseltine, R.H., 2010a. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Services and Outcomes Research Methodology 10 (1), 1–15.Matthews, G.J., Harel, O., Aseltine, R.H., 2010b. Examining the robustness of fully synthetic data techniques for data with binary variables. Journal of Statistical Computation and Simulation 80 (6), 609–624.Moore, Jr., R., 1996. Controlled data-swapping techniques for masking public use microdata. Census Tech Report.Mugge, R., 1983. Issues in protecting confidentiality in national health statistics. Proceedings of the Section on Survey Research Methods.Nissim, K., Raskhodnikova, S., Smith, A., 2007. Smooth sensitivity and sampling in private data analysis. In: STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. pp. 75–84.Paass, G., 1988. Disclosure risk and disclosure avoidance for microdata. Journal of Business and Economic Statistics 6 (4), 487–500.Palley, M., Simonoff, J., 1987. The use of regression methodology for the compromise of confidential information in statistical databases. ACM Trans. Database Systems 12 (4), 593–608.Raghunathan, T.E., Reiter, J.P., Rubin, D.B., 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics 19 (1), 1–16.Rajasekaran, S., Harel, O., Zuba, M., Matthews, G.J., Aseltine, Jr., R., 2009. Responsible data releases. In: Proceedings 9th Industrial Conference on Data Mining (ICDM). Springer LNCS, pp. 388–400.Reiss, S.P., 1984. Practical data-swapping: The first steps. CM Transactions on Database Systems 9, 20–37.Reiter, J.P., 2002. Satisfying disclosure restriction with synthetic data sets. Journal of Official Statistics 18 (4), 531–543.Reiter, J.P., 2003. Inference for partially synthetic, public use microdata sets. Survey Methodology 29 (2), 181–188.Reiter, J.P., 2004a. New approaches to data dissemination: A glimpse into the future (?). Chance 17 (3), 11–15.Reiter, J.P., 2004b. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology 30 (2), 235–242.Reiter, J.P., 2005a. Estimating risks of identification disclosure in microdata. Journal of the American Statistical Association 100, 1103–1112.Reiter, J.P., 2005b. Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A: Statistics in Society 168 (1), 185–205.Reiter, J.P., 2005c. Using CART to generate partially synthetic public use microdata. Journal of Official Statistics 21 (3), 441–462. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.Rubin, D.B., 1993. Comment on “Statistical disclosure limitation”. Journal of Official Statistics 9, 461–468.Rubner, Y., Tomasi, C., Guibas, L.J., 1998. A metric for distributions with applications to image databases. Computer Vision, IEEE International Conference on 0, 59.Sarathy, R., Muralidhar, K., 2002a. The security of confidential numerical data in databases. Information Systems Research 13 (4), 389–403.Sarathy, R., Muralidhar, K., 2002b. The security of confidential numerical data in databases. Info. Sys. Research 13 (4), 389–403.Schafer, J.L., Graham, J.W., 2002. Missing data: Our view of state of the art. Psychological Methods 7 (2), 147–177.Singh, A., Yu, F., Dunteman, G., 2003. MASSC: A new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality. pp. 373–394.Skinner, C., 2009. Statistical disclosure control for survey data. In: Pfeffermann, D and Rao, C.R. eds. Handbook of Statistics Vol. 29A: Sample Surveys: Design, Methods and Applications. pp. 381–396.Skinner, C., Marsh, C., Openshaw, S., Wymer, C., 1994. Disclosure control for census microdata. Journal of Official Statistics 10, 31–51.Skinner, C., Shlomo, N., 2008. Assessing identification risk in survey microdata using log-linear models. Journal of the American Statistical Association 103, 989–1001.Skinner, C.J., Elliot, M.J., 2002. A measure of disclosure risk for microdata. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64 (4), 855–867.Smith, A., 2008. Efficient, dfferentially private point estimators. arXiv:0809.4794v1 [cs.CR].Spruill, N.L., 1982. Measures of confidentiality. Statistics of Income and Related Administrative Record Research, 131–136.Spruill, N.L., 1983. The confidentiality and analytic usefulness of masked business microdata. In: Proceedings of the Section on Survey Reserach Microdata. American Statistical Association, pp. 602–607.Sweeney, L., 1996. Replacing personally-identifying information in medical records, the scrub system. In: American Medical Informatics Association. Hanley and Belfus, Inc., pp. 333–337.Sweeney, L., 1997. Guaranteeing anonymity when sharing medical data, the datafly system. Journal of the American Medical Informatics Association 4, 51–55.Sweeney, L., 2002a. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 571–588. Sweeney, L., 2002b. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 557–570.Tendick, P., 1991. Optimal noise addition for preserving confidentiality in multivariate data. Journal of Statistical Planning and Inference 27 (2), 341–353.United Nations Economic Comission for Europe (UNECE), 2007. Manging statistical cinfidentiality and microdata access: Principles and guidlinesof good practice.Warner, S.L., 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60 (309), 63–69.Wasserman, L., Zhou, S., 2010. A statistical framework for differential privacy. Journal of the American Statistical Association 105 (489), 375–389.Willenborg, L., de Waal, T., 2001. Elements of Statistical Disclosure Control. Springer-Verlag.Woodward, B., 1995. The computer-based patient record and confidentiality. The New England Journal of Medicine, 1419–1422. Full Article
ng Identifying the consequences of dynamic treatment strategies: A decision-theoretic overview By projecteuclid.org Published On :: Fri, 12 Nov 2010 11:39 EST A. Philip Dawid, Vanessa DidelezSource: Statist. Surv., Volume 4, 184--231.Abstract: We consider the problem of learning about and comparing the consequences of dynamic treatment strategies on the basis of observational data. We formulate this within a probabilistic decision-theoretic framework. Our approach is compared with related work by Robins and others: in particular, we show how Robins’s ‘ G -computation’ algorithm arises naturally from this decision-theoretic perspective. Careful attention is paid to the mathematical and substantive conditions required to justify the use of this formula. These conditions revolve around a property we term stability , which relates the probabilistic behaviours of observational and interventional regimes. We show how an assumption of ‘sequential randomization’ (or ‘no unmeasured confounders’), or an alternative assumption of ‘sequential irrelevance’, can be used to infer stability. Probabilistic influence diagrams are used to simplify manipulations, and their power and limitations are discussed. We compare our approach with alternative formulations based on causal DAGs or potential response models. We aim to show that formulating the problem of assessing dynamic treatment strategies as a problem of decision analysis brings clarity, simplicity and generality. References:Arjas, E. and Parner, J. (2004). Causal reasoning from longitudinal data. Scandinavian Journal of Statistics 31 171–187.Arjas, E. and Saarela, O. (2010). Optimal dynamic regimes: Presenting a case for predictive inference. The International Journal of Biostatistics 6. http://tinyurl.com/33dfssfCowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. Springer, New York.Dawid, A. P. (1979). Conditional independence in statistical theory (with Discussion). Journal of the Royal Statistical Society, Series B 41 1–31.Dawid, A. P. (1992). Applications of a general propagation algorithm for probabilistic expert systems. Statistics and Computing 2 25–36.Dawid, A. P. (1998). Conditional independence. In Encyclopedia of Statistical Science ({U}pdate Volume 2) ( S. Kotz, C. B. Read and D. L. Banks, eds.) 146–155. Wiley-Interscience, New York.Dawid, A. P. (2000). Causal inference without counterfactuals (with Discussion). Journal of the American Statistical Association 95 407–448.Dawid, A. P. (2001). Separoids: A mathematical framework for conditional independence and irrelevance. Annals of Mathematics and Artificial Intelligence 32 335–372.Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. International Statistical Review 70 161–189. Corrigenda, ibid ., 437.Dawid, A. P. (2003). Causal inference using influence diagrams: The problem of partial compliance (with Discussion). In Highly Structured Stochastic Systems ( P. J. Green, N. L. Hjort and S. Richardson, eds.) 45–81. Oxford University Press.Dawid, A. P. (2010). Beware of the DAG! In Proceedings of the NIPS 2008 Workshop on Causality. Journal of Machine Learning Research Workshop and Conference Proceedings ( D. Janzing, I. Guyon and B. Schölkopf, eds.) 6 59–86. http://tinyurl.com/33va7tmDawid, A. P. and Didelez, V. (2008). Identifying optimal sequential decisions. In Proceedings of the Twenty-Fourth Annual Conference on Uncertainty in Artificial Intelligence (UAI-08) ( D. McAllester and A. Nicholson, eds.). 113-120. AUAI Press, Corvallis, Oregon. http://tinyurl.com/3899qppDechter, R. (2003). Constraint Processing. Morgan Kaufmann Publishers.Didelez, V., Dawid, A. P. and Geneletti, S. G. (2006). Direct and indirect effects of sequential treatments. In Proceedings of the Twenty-Second Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) ( R. Dechter and T. Richardson, eds.). 138-146. AUAI Press, Arlington, Virginia. http://tinyurl.com/32w3f4eDidelez, V., Kreiner, S. and Keiding, N. (2010). Graphical models for inference under outcome dependent sampling. Statistical Science (to appear).Didelez, V. and Sheehan, N. S. (2007). Mendelian randomisation: Why epidemiology needs a formal language for causality. In Causality and Probability in the Sciences, ( F. Russo and J. Williamson, eds.). Texts in Philosophy Series 5 263–292. College Publications, London.Eichler, M. and Didelez, V. (2010). Granger-causality and the effect of interventions in time series. Lifetime Data Analysis 16 3–32.Ferguson, T. S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York, London.Geneletti, S. G. (2007). Identifying direct and indirect effects in a non–counterfactual framework. Journal of the Royal Statistical Society: Series B 69 199–215.Geneletti, S. G. and Dawid, A. P. (2010). Defining and identifying the effect of treatment on the treated. In Causality in the Sciences ( P. M. Illari, F. Russo and J. Williamson, eds.) Oxford University Press (to appear).Gill, R. D. and Robins, J. M. (2001). Causal inference for complex longitudinal data: The continuous case. Annals of Statistics 29 1785–1811.Guo, H. and Dawid, A. P. (2010). Sufficient covariates and linear propensity analysis. In Proceedings of the Thirteenth International Workshop on Artificial Intelligence and Statistics, (AISTATS) 2010, Chia Laguna, Sardinia, Italy, May 13-15, 2010. Journal of Machine Learning Research Workshop and Conference Proceedings ( Y. W. Teh and D. M. Titterington, eds.) 9 281–288. http://tinyurl.com/33lmuj7Henderson, R., Ansel, P. and Alshibani, D. (2010). Regret-regression for optimal dynamic treatment regimes. Biometrics (to appear). doi:10.1111/j.1541-0420.2009.01368.xHernán, M. A. and Taubman, S. L. (2008). Does obesity shorten life? The importance of well defined interventions to answer causal questions. International Journal of Obesity 32 S8–S14.Holland, P. W. (1986). Statistics and causal inference (with Discussion). Journal of the American Statistical Association 81 945–970.Huang, Y. and Valtorta, M. (2006). Identifiability in causal Bayesian networks: A sound and complete algorithm. In AAAI’06: Proceedings of the 21st National Conference on Artificial Intelligence 1149–1154. AAAI Press.Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science 22 523–539.Lauritzen, S. L., Dawid, A. P., Larsen, B. N. and Leimer, H. G. (1990). Independence properties of directed Markov fields. Networks 20 491–505.Lok, J., Gill, R., van der Vaart, A. and Robins, J. (2004). Estimating the causal effect of a time-varying treatment on time-to-event using structural nested failure time models. Statistica Neerlandica 58 271–295.Moodie, E. M., Richardson, T. S. and Stephens, D. A. (2007). Demystifying optimal dynamic treatment regimes. Biometrics 63 447–455.Murphy, S. A. (2003). Optimal dynamic treatment regimes (with Discussion). Journal of the Royal Statistical Society, Series B 65 331-366.Oliver, R. M. and Smith, J. Q., eds. (1990). Influence Diagrams, Belief Nets and Decision Analysis. John Wiley and Sons, Chichester, United Kingdom.Pearl, J. (1995). Causal diagrams for empirical research (with Discussion). Biometrika 82 669-710.Pearl, J. (2009). Causality: Models, Reasoning and Inference, Second ed. Cambridge University Press, Cambridge.Pearl, J. and Paz, A. (1987). Graphoids: A graph-based logic for reasoning about relevance relations. In Advances in Artificial Intelligence ( D. Hogg and L. Steels, eds.) II 357–363. North-Holland, Amsterdam.Pearl, J. and Robins, J. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence ( P. Besnard and S. Hanks, eds.) 444–453. Morgan Kaufmann Publishers, San Francisco.Raiffa, H. (1968). Decision Analysis. Addison-Wesley, Reading, Massachusetts.Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods—Application to control of the healthy worker survivor effect. Mathematical Modelling 7 1393–1512.Robins, J. M. (1987). Addendum to “A new approach to causal inference in mortality studies with sustained exposure periods—Application to control of the healthy worker survivor effect”. Computers & Mathematics with Applications 14 923–945.Robins, J. M. (1989). The analysis of randomized and nonrandomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In Health Service Research Methodology: A Focus on AIDS ( L. Sechrest, H. Freeman and A. Mulley, eds.) 113–159. NCSHR, U.S. Public Health Service.Robins, J. M. (1992). Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika 79 321–324.Robins, J. M. (1997). Causal inference from complex longitudinal data. In Latent Variable Modeling and Applications to Causality, ( M. Berkane, ed.). Lecture Notes in Statistics 120 69–117. Springer-Verlag, New York.Robins, J. M. (1998). Structural nested failure time models. In Survival Analysis, ( P. K. Andersen and N. Keiding, eds.). Encyclopedia of Biostatistics 6 4372–4389. John Wiley and Sons, Chichester, UK.Robins, J. M. (2000). Robust estimation in sequentially ignorable missing data and causal inference models. In Proceedings of the American Statistical Association Section on Bayesian Statistical Science 1999 6–10.Robins, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium on Biostatistics ( D. Y. Lin and P. Heagerty, eds.) 189–326. Springer, New York.Robins, J. M., Greenland, S. and Hu, F. C. (1999). Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome. Journal of the American Statistical Association 94 687–700.Robins, J. M., Hernán, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550–560.Robins, J. M. and Wasserman, L. A. (1997). Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence ( D. Geiger and P. Shenoy, eds.) 409-420. Morgan Kaufmann Publishers, San Francisco. http://tinyurl.com/33ghsasRosthøj, S., Fullwood, C., Henderson, R. and Stewart, S. (2006). Estimation of optimal dynamic anticoagulation regimes from observational data: A regret-based approach. Statistics in Medicine 25 4197–4215.Shpitser, I. and Pearl, J. (2006a). Identification of conditional interventional distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) ( R. Dechter and T. Richardson, eds.). 437–444. AUAI Press, Corvallis, Oregon. http://tinyurl.com/2um8w47Shpitser, I. and Pearl, J. (2006b). Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence 1219–1226. AAAI Press, Menlo Park, California.Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction and Search, Second ed. Springer-Verlag, New York.Sterne, J. A. C., May, M., Costagliola, D., de Wolf, F., Phillips, A. N., Harris, R., Funk, M. J., Geskus, R. B., Gill, J., Dabis, F., Miro, J. M., Justice, A. C., Ledergerber, B., Fatkenheuer, G., Hogg, R. S., D’Arminio-Monforte, A., Saag, M., Smith, C., Staszewski, S., Egger, M., Cole, S. R. and When To Start Consortium (2009). Timing of initiation of antiretroviral therapy in AIDS-Free HIV-1-infected patients: A collaborative analysis of 18 HIV cohort studies. Lancet 373 1352–1363.Taubman, S. L., Robins, J. M., Mittleman, M. A. and Hernán, M. A. (2009). Intervening on risk factors for coronary heart disease: An application of the parametric g-formula. International Journal of Epidemiology 38 1599–1611.Tian, J. (2008). Identifying dynamic sequential plans. In Proceedings of the Twenty-Fourth Annual Conference on Uncertainty in Artificial Intelligence (UAI-08) ( D. McAllester and A. Nicholson, eds.). 554–561. AUAI Press, Corvallis, Oregon. http://tinyurl.com/36ufx2hVerma, T. and Pearl, J. (1990). Causal networks: Semantics and expressiveness. In Uncertainty in Artificial Intelligence 4 ( R. D. Shachter, T. S. Levitt, L. N. Kanal and J. F. Lemmer, eds.) 69–76. North-Holland, Amsterdam. Full Article
ng Primal and dual model representations in kernel-based learning By projecteuclid.org Published On :: Wed, 25 Aug 2010 10:28 EDT Johan A.K. Suykens, Carlos Alzate, Kristiaan PelckmansSource: Statist. Surv., Volume 4, 148--183.Abstract: This paper discusses the role of primal and (Lagrange) dual model representations in problems of supervised and unsupervised learning. The specification of the estimation problem is conceived at the primal level as a constrained optimization problem. The constraints relate to the model which is expressed in terms of the feature map. From the conditions for optimality one jointly finds the optimal model representation and the model estimate. At the dual level the model is expressed in terms of a positive definite kernel function, which is characteristic for a support vector machine methodology. It is discussed how least squares support vector machines are playing a central role as core models across problems of regression, classification, principal component analysis, spectral clustering, canonical correlation analysis, dimensionality reduction and data visualization. Full Article
ng Finite mixture models and model-based clustering By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Volodymyr Melnykov, Ranjan MaitraSource: Statist. Surv., Volume 4, 80--116.Abstract: Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed. Full Article
ng Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis. (arXiv:2005.02535v1 [econ.EM] CROSS LISTED) By arxiv.org Published On :: Arctic sea ice extent (SIE) in September 2019 ranked second-to-lowest in history and is trending downward. The understanding of how internal variability amplifies the effects of external $ ext{CO}_2$ forcing is still limited. We propose the VARCTIC, which is a Vector Autoregression (VAR) designed to capture and extrapolate Arctic feedback loops. VARs are dynamic simultaneous systems of equations, routinely estimated to predict and understand the interactions of multiple macroeconomic time series. Hence, the VARCTIC is a parsimonious compromise between fullblown climate models and purely statistical approaches that usually offer little explanation of the underlying mechanism. Our "business as usual" completely unconditional forecast has SIE hitting 0 in September by the 2060s. Impulse response functions reveal that anthropogenic $ ext{CO}_2$ emission shocks have a permanent effect on SIE - a property shared by no other shock. Further, we find Albedo- and Thickness-based feedbacks to be the main amplification channels through which $ ext{CO}_2$ anomalies impact SIE in the short/medium run. Conditional forecast analyses reveal that the future path of SIE crucially depends on the evolution of $ ext{CO}_2$ emissions, with outcomes ranging from recovering SIE to it reaching 0 in the 2050s. Finally, Albedo and Thickness feedbacks are shown to play an important role in accelerating the speed at which predicted SIE is heading towards 0. Full Article
ng Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies. (arXiv:2005.01923v2 [cs.CV] UPDATED) By arxiv.org Published On :: Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tufts datasets for generating 3D varying face poses by using a single frontal face pose. The system works by refining the existing image quality by performing fusion based image preprocessing operations. The refined outputs have better contrast adjustments, decreased noise level and higher exposedness of the dark regions. It makes the facial landmarks and temperature patterns on the human face more discernible and visible when compared to original raw data. Different image quality metrics are used to compare the refined version of images with original images. In the next phase of the proposed study, the refined version of images is used to create 3D facial geometry structures by using Convolution Neural Networks (CNN). The generated outputs are then imported in blender software to finally extract the 3D thermal facial outputs of both males and females. The same technique is also used on our thermal face data acquired using prototype thermal camera (developed under Heliaus EU project) in an indoor lab environment which is then used for generating synthetic 3D face data along with varying yaw face angles and lastly facial depth map is generated. Full Article
ng Interpreting Rate-Distortion of Variational Autoencoder and Using Model Uncertainty for Anomaly Detection. (arXiv:2005.01889v2 [cs.LG] UPDATED) By arxiv.org Published On :: Building a scalable machine learning system for unsupervised anomaly detection via representation learning is highly desirable. One of the prevalent methods is using a reconstruction error from variational autoencoder (VAE) via maximizing the evidence lower bound. We revisit VAE from the perspective of information theory to provide some theoretical foundations on using the reconstruction error, and finally arrive at a simpler and more effective model for anomaly detection. In addition, to enhance the effectiveness of detecting anomalies, we incorporate a practical model uncertainty measure into the metric. We show empirically the competitive performance of our approach on benchmark datasets. Full Article
ng Data-Space Inversion Using a Recurrent Autoencoder for Time-Series Parameterization. (arXiv:2005.00061v2 [stat.ML] UPDATED) By arxiv.org Published On :: Data-space inversion (DSI) and related procedures represent a family of methods applicable for data assimilation in subsurface flow settings. These methods differ from model-based techniques in that they provide only posterior predictions for quantities (time series) of interest, not posterior models with calibrated parameters. DSI methods require a large number of flow simulations to first be performed on prior geological realizations. Given observed data, posterior predictions can then be generated directly. DSI operates in a Bayesian setting and provides posterior samples of the data vector. In this work we develop and evaluate a new approach for data parameterization in DSI. Parameterization reduces the number of variables to determine in the inversion, and it maintains the physical character of the data variables. The new parameterization uses a recurrent autoencoder (RAE) for dimension reduction, and a long-short-term memory (LSTM) network to represent flow-rate time series. The RAE-based parameterization is combined with an ensemble smoother with multiple data assimilation (ESMDA) for posterior generation. Results are presented for two- and three-phase flow in a 2D channelized system and a 3D multi-Gaussian model. The RAE procedure, along with existing DSI treatments, are assessed through comparison to reference rejection sampling (RS) results. The new DSI methodology is shown to consistently outperform existing approaches, in terms of statistical agreement with RS results. The method is also shown to accurately capture derived quantities, which are computed from variables considered directly in DSI. This requires correlation and covariance between variables to be properly captured, and accuracy in these relationships is demonstrated. The RAE-based parameterization developed here is clearly useful in DSI, and it may also find application in other subsurface flow problems. Full Article
ng A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging. (arXiv:2004.12314v3 [cs.CV] UPDATED) By arxiv.org Published On :: Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, automatic methods are of high interest, particularly optimized machine learning approaches. To address this, we organized the "2018 Left Atrium Segmentation Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset, and associated labels of the left atrium segmented by three medical experts, ultimately attracting the participation of 27 international teams. In this paper, extensive analysis of the submitted algorithms using technical and biological metrics was performed by undergoing subgroup analysis and conducting hyper-parameter analysis, offering an overall picture of the major design choices of convolutional neural networks (CNNs) and practical considerations for achieving state-of-the-art left atrium segmentation. Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm, significantly outperforming prior state-of-the-art. Particularly, our analysis demonstrated that double, sequentially used CNNs, in which a first CNN is used for automatic region-of-interest localization and a subsequent CNN is used for refined regional segmentation, achieved far superior results than traditional methods and pipelines containing single CNNs. This large-scale benchmarking study makes a significant step towards much-improved segmentation methods for cardiac LGE-MRIs, and will serve as an important benchmark for evaluating and comparing the future works in the field. Full Article
ng Excess registered deaths in England and Wales during the COVID-19 pandemic, March 2020 and April 2020. (arXiv:2004.11355v4 [stat.AP] UPDATED) By arxiv.org Published On :: Official counts of COVID-19 deaths have been criticized for potentially including people who did not die of COVID-19 but merely died with COVID-19. I address that critique by fitting a generalized additive model to weekly counts of all registered deaths in England and Wales during the 2010s. The model produces baseline rates of death registrations expected in the absence of the COVID-19 pandemic, and comparing those baselines to recent counts of registered deaths exposes the emergence of excess deaths late in March 2020. Among adults aged 45+, about 38,700 excess deaths were registered in the 5 weeks comprising 21 March through 24 April (612 $pm$ 416 from 21$-$27 March, 5675 $pm$ 439 from 28 March through 3 April, then 9183 $pm$ 468, 12,712 $pm$ 589, and 10,511 $pm$ 567 in April's next 3 weeks). Both the Office for National Statistics's respective count of 26,891 death certificates which mention COVID-19, and the Department of Health and Social Care's hospital-focused count of 21,222 deaths, are appreciably less, implying that their counting methods have underestimated rather than overestimated the pandemic's true death toll. If underreporting rates have held steady, about 45,900 direct and indirect COVID-19 deaths might have been registered by April's end but not yet publicly reported in full. Full Article
ng A Critical Overview of Privacy-Preserving Approaches for Collaborative Forecasting. (arXiv:2004.09612v3 [cs.LG] UPDATED) By arxiv.org Published On :: Cooperation between different data owners may lead to an improvement in forecast quality - for instance by benefiting from spatial-temporal dependencies in geographically distributed time series. Due to business competitive factors and personal data protection questions, said data owners might be unwilling to share their data, which increases the interest in collaborative privacy-preserving forecasting. This paper analyses the state-of-the-art and unveils several shortcomings of existing methods in guaranteeing data privacy when employing Vector Autoregressive (VAR) models. The paper also provides mathematical proofs and numerical analysis to evaluate existing privacy-preserving methods, dividing them into three groups: data transformation, secure multi-party computations, and decomposition methods. The analysis shows that state-of-the-art techniques have limitations in preserving data privacy, such as a trade-off between privacy and forecasting accuracy, while the original data in iterative model fitting processes, in which intermediate results are shared, can be inferred after some iterations. Full Article