ar

A new log-linear bimodal Birnbaum–Saunders regression model with application to survival data

Francisco Cribari-Neto, Rodney V. Fonseca.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 329--355.

Abstract:
The log-linear Birnbaum–Saunders model has been widely used in empirical applications. We introduce an extension of this model based on a recently proposed version of the Birnbaum–Saunders distribution which is more flexible than the standard Birnbaum–Saunders law since its density may assume both unimodal and bimodal shapes. We show how to perform point estimation, interval estimation and hypothesis testing inferences on the parameters that index the regression model we propose. We also present a number of diagnostic tools, such as residual analysis, local influence, generalized leverage, generalized Cook’s distance and model misspecification tests. We investigate the usefulness of model selection criteria and the accuracy of prediction intervals for the proposed model. Results of Monte Carlo simulations are presented. Finally, we also present and discuss an empirical application.




ar

The coreset variational Bayes (CVB) algorithm for mixture analysis

Qianying Liu, Clare A. McGrory, Peter W. J. Baxter.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 267--279.

Abstract:
The pressing need for improved methods for analysing and coping with big data has opened up a new area of research for statisticians. Image analysis is an area where there is typically a very large number of data points to be processed per image, and often multiple images are captured over time. These issues make it challenging to design methodology that is reliable and yet still efficient enough to be of practical use. One promising emerging approach for this problem is to reduce the amount of data that actually has to be processed by extracting what we call coresets from the full dataset; analysis is then based on the coreset rather than the whole dataset. Coresets are representative subsamples of data that are carefully selected via an adaptive sampling approach. We propose a new approach called coreset variational Bayes (CVB) for mixture modelling; this is an algorithm which can perform a variational Bayes analysis of a dataset based on just an extracted coreset of the data. We apply our algorithm to weed image analysis.




ar

A brief review of optimal scaling of the main MCMC approaches and optimal scaling of additive TMCMC under non-regular cases

Kushal K. Dey, Sourabh Bhattacharya.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 222--266.

Abstract:
Transformation based Markov Chain Monte Carlo (TMCMC) was proposed by Dutta and Bhattacharya ( Statistical Methodology 16 (2014) 100–116) as an efficient alternative to the Metropolis–Hastings algorithm, especially in high dimensions. The main advantage of this algorithm is that it simultaneously updates all components of a high dimensional parameter using appropriate move types defined by deterministic transformation of a single random variable. This results in reduction in time complexity at each step of the chain and enhances the acceptance rate. In this paper, we first provide a brief review of the optimal scaling theory for various existing MCMC approaches, comparing and contrasting them with the corresponding TMCMC approaches.The optimal scaling of the simplest form of TMCMC, namely additive TMCMC , has been studied extensively for the Gaussian proposal density in Dey and Bhattacharya (2017a). Here, we discuss diffusion-based optimal scaling behavior of additive TMCMC for non-Gaussian proposal densities—in particular, uniform, Student’s $t$ and Cauchy proposals. Although we could not formally prove our diffusion result for the Cauchy proposal, simulation based results lead us to conjecture that at least the recipe for obtaining general optimal scaling and optimal acceptance rate holds for the Cauchy case as well. We also consider diffusion based optimal scaling of TMCMC when the target density is discontinuous. Such non-regular situations have been studied in the case of Random Walk Metropolis Hastings (RWMH) algorithm by Neal and Roberts ( Methodology and Computing in Applied Probability 13 (2011) 583–601) using expected squared jumping distance (ESJD), but the diffusion theory based scaling has not been considered. We compare our diffusion based optimally scaled TMCMC approach with the ESJD based optimally scaled RWM with simulation studies involving several target distributions and proposal distributions including the challenging Cauchy proposal case, showing that additive TMCMC outperforms RWMH in almost all cases considered.




ar

Bayesian robustness to outliers in linear regression and ratio estimation

Alain Desgagné, Philippe Gagnon.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 205--221.

Abstract:
Whole robustness is a nice property to have for statistical models. It implies that the impact of outliers gradually vanishes as they approach plus or minus infinity. So far, the Bayesian literature provides results that ensure whole robustness for the location-scale model. In this paper, we make two contributions. First, we generalise the results to attain whole robustness in simple linear regression through the origin, which is a necessary step towards results for general linear regression models. We allow the variance of the error term to depend on the explanatory variable. This flexibility leads to the second contribution: we provide a simple Bayesian approach to robustly estimate finite population means and ratios. The strategy to attain whole robustness is simple since it lies in replacing the traditional normal assumption on the error term by a super heavy-tailed distribution assumption. As a result, users can estimate the parameters as usual, using the posterior distribution.




ar

An estimation method for latent traits and population parameters in Nominal Response Model

Caio L. N. Azevedo, Dalton F. Andrade

Source: Braz. J. Probab. Stat., Volume 24, Number 3, 415--433.

Abstract:
The nominal response model (NRM) was proposed by Bock [ Psychometrika 37 (1972) 29–51] in order to improve the latent trait (ability) estimation in multiple choice tests with nominal items. When the item parameters are known, expectation a posteriori or maximum a posteriori methods are commonly employed to estimate the latent traits, considering a standard symmetric normal distribution as the latent traits prior density. However, when this item set is presented to a new group of examinees, it is not only necessary to estimate their latent traits but also the population parameters of this group. This article has two main purposes: first, to develop a Monte Carlo Markov Chain algorithm to estimate both latent traits and population parameters concurrently. This algorithm comprises the Metropolis–Hastings within Gibbs sampling algorithm (MHWGS) proposed by Patz and Junker [ Journal of Educational and Behavioral Statistics 24 (1999b) 346–366]. Second, to compare, in the latent trait recovering, the performance of this method with three other methods: maximum likelihood, expectation a posteriori and maximum a posteriori. The comparisons were performed by varying the total number of items (NI), the number of categories and the values of the mean and the variance of the latent trait distribution. The results showed that MHWGS outperforms the other methods concerning the latent traits estimation as well as it recoveries properly the population parameters. Furthermore, we found that NI accounts for the highest percentage of the variability in the accuracy of latent trait estimation.




ar

Nights below Foord Street : literature and popular culture in postindustrial Nova Scotia

Thompson, Peter, 1981- author.
0773559345




ar

Public-private partnerships in Canada : law, policy and value for money

Murphy, Timothy J. (Timothy John), author.
9780433457985 (Cloth)




ar

Globalizing capital : a history of the international monetary system

Eichengreen, Barry J., author.
9780691193908 (paperback)




ar

Documenting rebellions : a study of four lesbian and gay archives in queer times

Sheffield, Rebecka Taves, author.
9781634000918 paperback




ar

Flexible, boundary adapted, nonparametric methods for the estimation of univariate piecewise-smooth functions

Umberto Amato, Anestis Antoniadis, Italia De Feis.

Source: Statistics Surveys, Volume 14, 32--70.

Abstract:
We present and compare some nonparametric estimation methods (wavelet and/or spline-based) designed to recover a one-dimensional piecewise-smooth regression function in both a fixed equidistant or not equidistant design regression model and a random design model. Wavelet methods are known to be very competitive in terms of denoising and compression, due to the simultaneous localization property of a function in time and frequency. However, boundary assumptions, such as periodicity or symmetry, generate bias and artificial wiggles which degrade overall accuracy. Simple methods have been proposed in the literature for reducing the bias at the boundaries. We introduce new ones based on adaptive combinations of two estimators. The underlying idea is to combine a highly accurate method for non-regular functions, e.g., wavelets, with one well behaved at boundaries, e.g., Splines or Local Polynomial. We provide some asymptotic optimal results supporting our approach. All the methods can handle data with a random design. We also sketch some generalization to the multidimensional setting. To study the performance of the proposed approaches we have conducted an extensive set of simulations on synthetic data. An interesting regression analysis of two real data applications using these procedures unambiguously demonstrates their effectiveness.




ar

Estimating the size of a hidden finite set: Large-sample behavior of estimators

Si Cheng, Daniel J. Eck, Forrest W. Crawford.

Source: Statistics Surveys, Volume 14, 1--31.

Abstract:
A finite set is “hidden” if its elements are not directly enumerable or if its size cannot be ascertained via a deterministic query. In public health, epidemiology, demography, ecology and intelligence analysis, researchers have developed a wide variety of indirect statistical approaches, under different models for sampling and observation, for estimating the size of a hidden set. Some methods make use of random sampling with known or estimable sampling probabilities, and others make structural assumptions about relationships (e.g. ordering or network information) between the elements that comprise the hidden set. In this review, we describe models and methods for learning about the size of a hidden finite set, with special attention to asymptotic properties of estimators. We study the properties of these methods under two asymptotic regimes, “infill” in which the number of fixed-size samples increases, but the population size remains constant, and “outfill” in which the sample size and population size grow together. Statistical properties under these two regimes can be dramatically different.




ar

Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists

John J. Dziak, Donna L. Coffman, Matthew Reimherr, Justin Petrovich, Runze Li, Saul Shiffman, Mariya P. Shiyko.

Source: Statistics Surveys, Volume 13, 150--180.

Abstract:
Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.




ar

PLS for Big Data: A unified parallel algorithm for regularised group PLS

Pierre Lafaye de Micheaux, Benoît Liquet, Matthew Sutton.

Source: Statistics Surveys, Volume 13, 119--149.

Abstract:
Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at https://github.com/matt-sutton/bigsgPLS .




ar

Pitfalls of significance testing and $p$-value variability: An econometrics perspective

Norbert Hirschauer, Sven Grüner, Oliver Mußhoff, Claudia Becker.

Source: Statistics Surveys, Volume 12, 136--172.

Abstract:
Data on how many scientific findings are reproducible are generally bleak and a wealth of papers have warned against misuses of the $p$-value and resulting false findings in recent years. This paper discusses the question of what we can(not) learn from the $p$-value, which is still widely considered as the gold standard of statistical validity. We aim to provide a non-technical and easily accessible resource for statistical practitioners who wish to spot and avoid misinterpretations and misuses of statistical significance tests. For this purpose, we first classify and describe the most widely discussed (“classical”) pitfalls of significance testing, and review published work on these misuses with a focus on regression-based “confirmatory” study. This includes a description of the single-study bias and a simulation-based illustration of how proper meta-analysis compares to misleading significance counts (“vote counting”). Going beyond the classical pitfalls, we also use simulation to provide intuition that relying on the statistical estimate “$p$-value” as a measure of evidence without considering its sample-to-sample variability falls short of the mark even within an otherwise appropriate interpretation. We conclude with a discussion of the exigencies of informed approaches to statistical inference and corresponding institutional reforms.




ar

A review of dynamic network models with latent variables

Bomin Kim, Kevin H. Lee, Lingzhou Xue, Xiaoyue Niu.

Source: Statistics Surveys, Volume 12, 105--135.

Abstract:
We present a selective review of statistical modeling of dynamic networks. We focus on models with latent variables, specifically, the latent space models and the latent class models (or stochastic blockmodels), which investigate both the observed features and the unobserved structure of networks. We begin with an overview of the static models, and then we introduce the dynamic extensions. For each dynamic model, we also discuss its applications that have been studied in the literature, with the data source listed in Appendix. Based on the review, we summarize a list of open problems and challenges in dynamic network modeling with latent variables.




ar

Variable selection methods for model-based clustering

Michael Fop, Thomas Brendan Murphy.

Source: Statistics Surveys, Volume 12, 18--65.

Abstract:
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.




ar

Measuring multivariate association and beyond

Julie Josse, Susan Holmes.

Source: Statistics Surveys, Volume 10, 132--167.

Abstract:
Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association’s underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research.




ar

A comparison of spatial predictors when datasets could be very large

Jonathan R. Bradley, Noel Cressie, Tao Shi.

Source: Statistics Surveys, Volume 10, 100--131.

Abstract:
In this article, we review and compare a number of methods of spatial prediction, where each method is viewed as an algorithm that processes spatial data. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, fixed rank kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of $mathrm{CO}_{2}$ data from NASA’s AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.




ar

$M$-functionals of multivariate scatter

Lutz Dümbgen, Markus Pauly, Thomas Schweizer.

Source: Statistics Surveys, Volume 9, 32--105.

Abstract:
This survey provides a self-contained account of $M$-estimation of multivariate scatter. In particular, we present new proofs for existence of the underlying $M$-functionals and discuss their weak continuity and differentiability. This is done in a rather general framework with matrix-valued random variables. By doing so we reveal a connection between Tyler’s (1987a) $M$-functional of scatter and the estimation of proportional covariance matrices. Moreover, this general framework allows us to treat a new class of scatter estimators, based on symmetrizations of arbitrary order. Finally these results are applied to $M$-estimation of multivariate location and scatter via multivariate $t$-distributions.




ar

Semi-parametric estimation for conditional independence multivariate finite mixture models

Didier Chauveau, David R. Hunter, Michael Levine.

Source: Statistics Surveys, Volume 9, 1--31.

Abstract:
The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets.




ar

Errata: A survey of Bayesian predictive methods for model assessment, selection and comparison

Aki Vehtari, Janne Ojanen.

Source: Statistics Surveys, Volume 8, , 1--1.

Abstract:
Errata for “A survey of Bayesian predictive methods for model assessment, selection and comparison” by A. Vehtari and J. Ojanen, Statistics Surveys , 6 (2012), 142–228. doi:10.1214/12-SS102.




ar

A survey of Bayesian predictive methods for model assessment, selection and comparison

Aki Vehtari, Janne Ojanen

Source: Statist. Surv., Volume 6, 142--228.

Abstract:
To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.




ar

Curse of dimensionality and related issues in nonparametric functional regression

Gery Geenens

Source: Statist. Surv., Volume 5, 30--43.

Abstract:
Recently, some nonparametric regression ideas have been extended to the case of functional regression. Within that framework, the main concern arises from the infinite dimensional nature of the explanatory objects. Specifically, in the classical multivariate regression context, it is well-known that any nonparametric method is affected by the so-called “curse of dimensionality”, caused by the sparsity of data in high-dimensional spaces, resulting in a decrease in fastest achievable rates of convergence of regression function estimators toward their target curve as the dimension of the regressor vector increases. Therefore, it is not surprising to find dramatically bad theoretical properties for the nonparametric functional regression estimators, leading many authors to condemn the methodology. Nevertheless, a closer look at the meaning of the functional data under study and on the conclusions that the statistician would like to draw from it allows to consider the problem from another point-of-view, and to justify the use of slightly modified estimators. In most cases, it can be entirely legitimate to measure the proximity between two elements of the infinite dimensional functional space via a semi-metric, which could prevent those estimators suffering from what we will call the “curse of infinite dimensionality”.

References:
[1] Ait-Saïdi, A., Ferraty, F., Kassa, K. and Vieu, P. (2008). Cross-validated estimations in the single-functional index model, Statistics, 42, 475–494.

[2] Aneiros-Perez, G. and Vieu, P. (2008). Nonparametric time series prediction: A semi-functional partial linear modeling, J. Multivariate Anal., 99, 834–857.

[3] Baillo, A. and Grané, A. (2009). Local linear regression for functional predictor and scalar response, J. Multivariate Anal., 100, 102–111.

[4] Burba, F., Ferraty, F. and Vieu, P. (2009). k-Nearest Neighbour method in functional nonparametric regression, J. Nonparam. Stat., 21, 453–469.

[5] Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model, Stat. Probabil. Lett., 45, 11–22.

[6] Crambes, C., Kneip, A. and Sarda, P. (2009). Smoothing splines estimators for functional linear regression, Ann. Statist., 37, 35–72.

[7] Delsol, L. (2009). Advances on asymptotic normality in nonparametric functional time series analysis, Statistics, 43, 13–33.

[8] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, Chapman and Hall, London.

[9] Fan, J. and Zhang, J.-T. (2000). Two-step estimation of functional linear models with application to longitudinal data, J. Roy. Stat. Soc. B, 62, 303–322.

[10] Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis, Springer-Verlag, New York.

[11] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating Some Characteristics of the Conditional Distribution in Nonparametric Functional Models, Statist. Inf. Stoch. Proc., 9, 47–76.

[12] Ferraty, F., Mas, A. and Vieu, P. (2007). Nonparametric regression on functional data: inference and practical aspects, Aust. NZ. J. Stat., 49, 267–286.

[13] Ferraty, F., Van Keilegom, I. and Vieu, P. (2010). On the validity of the bootstrap in nonparametric functional regression, Scand. J. Stat., 37, 286–306.

[14] Ferraty, F., Laksaci, A., Tadj, A. and Vieu, P. (2010). Rate of uniform consistency for nonparametric estimates with functional variables, J. Stat. Plan. Inf., 140, 335–352.

[15] Ferraty, F. and Romain, Y. (2011). Oxford handbook on functional data analysis (Eds), Oxford University Press.

[16] Gasser, T., Hall, P. and Presnell, B. (1998). Nonparametric estimation of the mode of a distribution of random curves, J. Roy. Stat. Soc. B, 60, 681–691.

[17] Geenens, G. (2011). A nonparametric functional method for signature recognition, Manuscript.

[18] Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and semiparametric models, Springer-Verlag, Berlin.

[19] James, G.M. (2002). Generalized linear models with functional predictors, J. Roy. Stat. Soc. B, 64, 411–432.

[20] Masry, E. (2005). Nonparametric regression estimation for dependent functional data: asymptotic normality, Stochastic Process. Appl., 115, 155–177.

[21] Nadaraya, E.A. (1964). On estimating regression, Theory Probab. Applic., 9, 141–142.

[22] Quintela-Del-Rio, A. (2008). Hazard function given a functional variable: nonparametric estimation under strong mixing conditions, J. Nonparam. Stat., 20, 413–430.

[23] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data: automatic smoothing parameter selection, J. Stat. Plan. Inf., 137, 2784–2801.

[24] Ramsay, J. and Silverman, B.W. (1997). Functional Data Analysis, Springer-Verlag, New York.

[25] Ramsay, J. and Silverman, B.W. (2002). Applied functional data analysis; methods and case study, Springer-Verlag, New York.

[26] Ramsay, J. and Silverman, B.W. (2005). Functional Data Analysis, 2nd Edition, Springer-Verlag, New York.

[27] Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression, Ann. Stat., 10, 1040–1053.

[28] Watson, G.S. (1964). Smooth regression analysis, Sankhya A, 26, 359–372.

[29] Yeung, D.T., Chang, H., Xiong, Y., George, S., Kashi, R., Matsumoto, T. and Rigoll, G. (2004). SVC2004: First International Signature Verification Competition, Proceedings of the International Conference on Biometric Authentication (ICBA), Hong Kong, July 2004.




ar

The ARMA alphabet soup: A tour of ARMA model variants

Scott H. Holan, Robert Lund, Ginger Davis

Source: Statist. Surv., Volume 4, 232--274.

Abstract:
Autoregressive moving-average (ARMA) difference equations are ubiquitous models for short memory time series and have parsimoniously described many stationary series. Variants of ARMA models have been proposed to describe more exotic series features such as long memory autocovariances, periodic autocovariances, and count support set structures. This review paper enumerates, compares, and contrasts the common variants of ARMA models in today’s literature. After the basic properties of ARMA models are reviewed, we tour ARMA variants that describe seasonal features, long memory behavior, multivariate series, changing variances (stochastic volatility) and integer counts. A list of ARMA variant acronyms is provided.

References:
Aknouche, A. and Guerbyenne, H. (2006). Recursive estimation of GARCH models. Communications in Statistics-Simulation and Computation 35 925–938.

Alzaid, A. A. and Al-Osh, M. (1990). An integer-valued pth-order autoregressive structure (INAR (p)) process. Journal of Applied Probability 27 314–324.

Anderson, P. L., Tesfaye, Y. G. and Meerschaert, M. M. (2007). Fourier-PARMA models and their application to river flows. Journal of Hydrologic Engineering 12 462–472.

Ansley, C. F. (1979). An algorithm for the exact likelihood of a mixed autoregressive-moving average process. Biometrika 66 59–65.

Basawa, I. V. and Lund, R. (2001). Large sample properties of parameter estimates for periodic ARMA models. Journal of Time Series Analysis 22 651–663.

Bauwens, L., Laurent, S. and Rombouts, J. V. K. (2006). Multivariate GARCH models: A survey. Journal of Applied Econometrics 21 79–109.

Bertelli, S. and Caporin, M. (2002). A note on calculating autocovariances of long-memory processes. Journal of Time Series Analysis 23 503–508.

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31 307–327.

Bollerslev, T. (2008). Glossary to ARCH (GARCH). CREATES Research Paper 2008-49.

Bollerslev, T., Engle, R. F. and Wooldridge, J. M. (1988). A capital asset pricing model with time-varying covariances. The Journal of Political Economy 96 116–131.

Bondon, P. and Palma, W. (2007). A class of antipersistent processes. Journal of Time Series Analysis 28 261–273.

Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. The Annals of Probability 20 1714–1730.

Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (2008). Time Series Analysis: Forecasting and Control, 4th ed. Wiley, New Jersey.

Breidt, F. J., Davis, R. A. and Trindade, A. A. (2001). Least absolute deviation estimation for all-pass time series models. Annals of Statistics 29 919–946.

Brockwell, P. J. (1994). On continuous-time threshold ARMA processes. Journal of Statistical Planning and Inference 39 291–303.

Brockwell, P. J. (2001). Continuous-time ARMA processes. In Stochastic Processes: Theory and Methods, ( D. N. Shanbhag and C. R. Rao, eds.). Handbook of Statistics 19 249–276. Elsevier.

Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed. Springer, New York.

Brockwell, P. J. and Davis, R. A. (2002). Introduction to Time Series and Forecasting, 2nd ed. Springer, New York.

Brockwell, P. J. and Marquardt, T. (2005). Lèvy-driven and fractionally integrated ARMA processes with continuous-time paramaters. Statistica Sinica 15 477–494.

Chan, K. S. (1990). Testing for threshold autoregression. Annals of Statistics 18 1886–1894.

Chan, N. H. (2002). Time Series: Applications to Finance. John Wiley & Sons, New York.

Chan, N. H. and Palma, W. (1998). State space modeling of long-memory processes. Annals of Statistics 26 719–740.

Chan, N. H. and Palma, W. (2006). Estimation of long-memory time series models: A survey of different likelihood-based methods. Advances in Econometrics 20 89–121.

Chatfield, C. (2003). The Analysis of Time Series: An Introduction, 6th ed. Chapman & Hall/CRC, Boca Raton.

Chen, W., Hurvich, C. M. and Lu, Y. (2006). On the correlation matrix of the discrete Fourier transform and the fast solution of large Toeplitz systems for long-memory time series. Journal of the American Statistical Association 101 812–822.

Chernick, M. R., Hsing, T. and McCormick, W. P. (1991). Calculating the extremal index for a class of stationary sequences. Advances in Applied Probability 23 835–850.

Chib, S., Nardari, F. and Shephard, N. (2006). Analysis of high dimensional multivariate stochastic volatility models. Journal of Econometrics 134 341–371.

Cryer, J. D. and Chan, K. S. (2008). Time Series Analysis: With Applications in R. Springer, New York.

Cui, Y. and Lund, R. (2009). A new look at time series of counts. Biometrika 96 781–792.

Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (1999). Modeling time series of count data. In Asymptotics, Nonparametrics and Time Series, ( S. Ghosh, ed.). Statistics Textbooks Monograph 63–113. Marcel Dekker, New York.

Davis, R. A., Dunsmuir, W. and Streett, S. B. (2003). Observation-driven models for Poisson counts. Biometrika 90 777–790.

Davis, R. A. and Resnick, S. I. (1996). Limit theory for bilinear processes with heavy-tailed noise. The Annals of Applied Probability 6 1191–1210.

Deistler, M. and Hannan, E. J. (1981). Some properties of the parameterization of ARMA systems with unknown order. Journal of Multivariate Analysis 11 474–484.

Dufour, J. M. and Jouini, T. (2005). Asymptotic distribution of a simple linear estimator for VARMA models in echelon form. Statistical Modeling and Analysis for Complex Data Problems 209–240.

Dunsmuir, W. and Hannan, E. J. (1976). Vector linear time series models. Advances in Applied Probability 8 339–364.

Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press, Oxford.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50 987–1007.

Engle, R. F. (2002). Dynamic conditional correlation. Journal of Business and Economic Statistics 20 339–350.

Engle, R. F. and Bollerslev, T. (1986). Modelling the persistence of conditional variances. Econometric Reviews 5 1–50.

Fuller, W. A. (1996). Introduction to Statistical Time Series, 2nd ed. John Wiley & Sons, New York.

Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory time series models. Journal of Time Series Analysis 4 221–238.

Gladyšhev, E. G. (1961). Periodically correlated random sequences. Soviet Math 2 385–388.

Granger, C. W. J. (1982). Acronyms in time series analysis (ATSA). Journal of Time Series Analysis 3 103–107.

Granger, C. W. J. and Andersen, A. P. (1978). An Introduction to Bilinear Time Series Models. Vandenhoeck and Ruprecht Göttingen.

Granger, C. W. J. and Joyeux, R. (1980). An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis 1 15–29.

Gray, H. L., Zhang, N. F. and Woodward, W. A. (1989). On generalized fractional processes. Journal of Time Series Analysis 10 233–257.

Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press, Princeton, New Jersey.

Hannan, E. J. (1955). A test for singularities in Sydney rainfall. Australian Journal of Physics 8 289–297.

Hannan, E. J. (1969). The identification of vector mixed autoregressive-moving average system. Biometrika 56 223–225.

Hannan, E. J. (1970). Multiple Time Series. John Wiley & Sons, New York.

Hannan, E. J. (1976). The identification and parameterization of ARMAX and state space forms. Econometrica 44 713–723.

Hannan, E. J. (1979). The Statistical Theory of Linear Systems. In Developments in Statistics ( P. R. Krishnaiah, ed.) 83–121. Academic Press, New York.

Hannan, E. J. and Deistler, M. (1987). The Statistical Theory of Linear Systems. John Wiley & Sons, New York.

Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge.

Haslett, J. and Raftery, A. E. (1989). Space-time modelling with long-memory dependence: Assessing Ireland’s wind power resource. Applied Statistics 38 1–50.

Hosking, J. R. M. (1981). Fractional differencing. Biometrika 68 165–176.

Hui, Y. V. and Li, W. K. (1995). On fractionally differenced periodic processes. Sankhyā: The Indian Journal of Statistics, Series B 57 19–31.

Jacobs, P. A. and Lewis, P. A. W. (1978a). Discrete time series generated by mixtures. I: Correlational and runs properties. Journal of the Royal Statistical Society. Series B (Methodological) 40 94–105.

Jacobs, P. A. and Lewis, P. A. W. (1978b). Discrete time series generated by mixtures II: Asymptotic properties. Journal of the Royal Statistical Society. Series B (Methodological) 40 222–228.

Jacobs, P. A. and Lewis, P. A. W. (1983). Stationary discrete autoregressive-moving average time series generated by mixtures. Journal of Time Series Analysis 4 19–36.

Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics 22 389–395.

Jones, R. H. and Brelsford, W. M. (1967). Time series with periodic structure. Biometrika 54 403–408.

Kedem, B. and Fokianos, K. (2002). Regression Models for Time Series Analysis. John Wiley & Sons, New Jersey.

Ko, K. and Vannucci, M. (2006). Bayesian wavelet-based methods for the detection of multiple changes of the long memory parameter. IEEE Transactions on Signal Processing 54 4461–4470.

Kohn, R. (1979). Asymptotic estimation and hypothesis testing results for vector linear time series models. Econometrica 47 1005–1030.

Kokoszka, P. S. and Taqqu, M. S. (1995). Fractional ARIMA with stable innovations. Stochastic Processes and their Applications 60 19–47.

Kokoszka, P. S. and Taqqu, M. S. (1996). Parameter estimation for infinite variance fractional ARIMA. Annals of Statistics 24 1880–1913.

Lawrance, A. J. and Lewis, P. A. W. (1980). The exponential autoregressive-moving average EARMA(p,q) process. Journal of the Royal Statistical Society. Series B (Methodological) 42 150–161.

Ling, S. and Li, W. K. (1997). On fractionally integrated autoregressive moving-average time series models with conditional heteroscedasticity. Journal of the American Statistical Association 92 1184–1194.

Liu, J. and Brockwell, P. J. (1988). On the general bilinear time series model. Journal of Applied Probability 25 553–564.

Lund, R. and Basawa, I. V. (2000). Recursive prediction and likelihood evaluation for periodic ARMA models. Journal of Time Series Analysis 21 75–93.

Lund, R., Shao, Q. and Basawa, I. (2006). Parsimonious periodic time series modeling. Australian & New Zealand Journal of Statistics 48 33–47.

Lütkepohl, H. (1991). Introduction to Multiple Time Series Analysis. Springer-Verlag, New York.

Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, New York.

MacDonald, I. L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-Valued Time Series. Chapman & Hall/CRC, Boca Raton.

Mann, H. B. and Wald, A. (1943). On the statistical treatment of linear stochastic difference equations. Econometrica 11 173–220.

Marriott, J., Ravishanker, N., Gelfand, A. and Pai, J. (1996). Bayesian analysis of ARMA processes: Complete sampling-based inference under exact likelihoods. In Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner ( D. Berry, K. Challoner and J. Geweke, eds.) 243–256. Wiley, New York.

McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability 20 822–835.

Mikosch, T. and Starica, C. (2004). Nonstationarities in financial time series, the long-range dependence, and the IGARCH effects. Review of Economics and Statistics 86 378–390.

Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59 347–370.

Nelson, D. B. and Cao, C. Q. (1992). Inequality constraints in the univariate GARCH model. Journal of Business and Economic Statistics 10 229–235.

Ooms, M. and Franses, P. H. (2001). A seasonal periodic long memory model for monthly river flows. Environmental Modelling & Software 16 559–569.

Pagano, M. (1978). On periodic and multiple autoregressions. Annals of Statistics 6 1310–1317.

Pai, J. S. and Ravishanker, N. (1998). Bayesian analysis of autoregressive fractionally integrated moving-average processes. Journal of Time Series Analysis 19 99–112.

Palma, W. (2007). Long-Memory Time Series: Theory and Methods. John Wiley & Sons, New Jersey.

Palma, W. and Chan, N. H. (2005). Efficient estimation of seasonal long-range-dependent processes. Journal of Time Series Analysis 26 863–892.

Pfeifer, P. E. and Deutsch, S. J. (1980). A three-stage iterative procedure for space-time modeling. Technometrics 22 35–47.

Prado, R. and West, M. (2010). Time Series Modeling, Computation and Inference. Chapman & Hall/CRC, Boca Raton.

Quoreshi, A. M. M. S. (2008). A long memory count data time series model for financial application. Preprint.

R Development Core Team, (2010). R: A Language and Environment for Statistical Computing. http://www.R-project.org.

Ravishanker, N. and Ray, B. K. (1997). Bayesian analysis of vector ARMA models using Gibbs sampling. Journal of Forecasting 16 177–194.

Ravishanker, N. and Ray, B. K. (2002). Bayesian prediction for vector ARFIMA processes. International Journal of Forecasting 18 207–214.

Reinsel, G. C. (1997). Elements of Multivariate Time Series Analysis. Springer, New York.

Resnick, S. I. and Willekens, E. (1991). Moving averages with random coefficients and random coefficient autoregressive models. Communications in Statistics. Stochastic Models 7 511–525.

Rootzén, H. (1986). Extreme value theory for moving average processes. The Annals of Probability 14 612–652.

Scotto, M. G. (2007). Extremes for solutions to stochastic difference equations with regularly varying tails. REVSTAT–Statistical Journal 5 229–247.

Shao, Q. and Lund, R. (2004). Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. Journal of Time Series Analysis 25 359–372.

Shumway, R. H. and Stoffer, D. S. (2006). Time Series Analysis and its Applications: With R Examples, 2nd ed. Springer, New York.

Silvennoinen, A. and Teräsvirta, T. (2009). Multivariate GARCH models. In Handbook of Financial Time Series ( T. Andersen, R. Davis, J. Kreib, and T. Mikosch, eds.) Springer, New York.

Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally integrated time series models. Journal of Econometrics 53 165–188.

Startz, R. (2008). Binomial autoregressive moving average models with an application to U.S. recessions. Journal of Business and Economic Statistics 26 1–8.

Stramer, O., Tweedie, R. L. and Brockwell, P. J. (1996). Existence and stability of continuous time threshold ARMA processes. Statistica Sinica 6 715–732.

Subba Rao, T. (1981). On the theory of bilinear time series models. Journal of the Royal Statistical Society. Series B (Methodological) 43 244–255.

Tong, H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data. Journal of the Royal Statistical Society. Series B (Methodological) 42 245–292.

Troutman, B. M. (1979). Some results in periodic autoregression. Biometrika 66 219–228.

Tsai, H. (2009). On continuous-time autoregressive fractionally integrated moving average processes. Bernoulli 15 178–194.

Tsai, H. and Chan, K. S. (2000). A note on the covariance structure of a continuous-time ARMA process. Statistica Sinica 10 989–998.

Tsai, H. and Chan, K. S. (2005). Maximum likelihood estimation of linear continuous time long memory processes with discrete time data. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 67 703–716.

Tsai, H. and Chan, K. S. (2008). A note on inequality constraints in the GARCH model. Econometric Theory 24 823–828.

Tsay, R. S. (1989). Parsimonious parameterization of vector autoregressive moving average models. Journal of Business and Economic Statistics 7 327–341.

Tunnicliffe-Wilson, G. (1979). Some efficient computational procedures for high order ARMA models. Journal of Statistical Computation and Simulation 8 301–309.

Ursu, E. and Duchesne, P. (2009). On modelling and diagnostic checking of vector periodic autoregressive time series models. Journal of Time Series Analysis 30 70–96.

Vecchia, A. V. (1985a). Maximum likelihood estimation for periodic autoregressive moving average models. Technometrics 27 375–384.

Vecchia, A. V. (1985b). Periodic autoregressive-moving average (PARMA) modeling with applications to water resources. Journal of the American Water Resources Association 21 721–730.

Vidakovic, B. (1999). Statistical Modeling by Wavelets. John Wiley & Sons, New York.

West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, 2nd ed. Springer, New York.

Wold, H. (1954). A Study in the Analysis of Stationary Time Series. Almquist & Wiksell, Stockholm.

Woodward, W. A., Cheng, Q. C. and Gray, H. L. (1998). A k-factor GARMA long-memory model. Journal of Time Series Analysis 19 485–504.

Zivot, E. and Wang, J. (2006). Modeling Financial Time Series with S-PLUS, 2nd ed. Springer, New York.




ar

Primal and dual model representations in kernel-based learning

Johan A.K. Suykens, Carlos Alzate, Kristiaan Pelckmans

Source: Statist. Surv., Volume 4, 148--183.

Abstract:
This paper discusses the role of primal and (Lagrange) dual model representations in problems of supervised and unsupervised learning. The specification of the estimation problem is conceived at the primal level as a constrained optimization problem. The constraints relate to the model which is expressed in terms of the feature map. From the conditions for optimality one jointly finds the optimal model representation and the model estimate. At the dual level the model is expressed in terms of a positive definite kernel function, which is characteristic for a support vector machine methodology. It is discussed how least squares support vector machines are playing a central role as core models across problems of regression, classification, principal component analysis, spectral clustering, canonical correlation analysis, dimensionality reduction and data visualization.




ar

Discrete variations of the fractional Brownian motion in the presence of outliers and an additive noise

Sophie Achard, Jean-François Coeurjolly

Source: Statist. Surv., Volume 4, 117--147.

Abstract:
This paper gives an overview of the problem of estimating the Hurst parameter of a fractional Brownian motion when the data are observed with outliers and/or with an additive noise by using methods based on discrete variations. We show that the classical estimation procedure based on the log-linearity of the variogram of dilated series is made more robust to outliers and/or an additive noise by considering sample quantiles and trimmed means of the squared series or differences of empirical variances. These different procedures are compared and discussed through a large simulation study and are implemented in the R package dvfBm.




ar

Start your Chinese Family Search at the State Library of...

Start your Chinese Family Search at the State Library of NSW   One in ten Sydneysiders claims Chinese ancestry




ar

Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis. (arXiv:2005.02535v1 [econ.EM] CROSS LISTED)

Arctic sea ice extent (SIE) in September 2019 ranked second-to-lowest in history and is trending downward. The understanding of how internal variability amplifies the effects of external $ ext{CO}_2$ forcing is still limited. We propose the VARCTIC, which is a Vector Autoregression (VAR) designed to capture and extrapolate Arctic feedback loops. VARs are dynamic simultaneous systems of equations, routinely estimated to predict and understand the interactions of multiple macroeconomic time series. Hence, the VARCTIC is a parsimonious compromise between fullblown climate models and purely statistical approaches that usually offer little explanation of the underlying mechanism. Our "business as usual" completely unconditional forecast has SIE hitting 0 in September by the 2060s. Impulse response functions reveal that anthropogenic $ ext{CO}_2$ emission shocks have a permanent effect on SIE - a property shared by no other shock. Further, we find Albedo- and Thickness-based feedbacks to be the main amplification channels through which $ ext{CO}_2$ anomalies impact SIE in the short/medium run. Conditional forecast analyses reveal that the future path of SIE crucially depends on the evolution of $ ext{CO}_2$ emissions, with outcomes ranging from recovering SIE to it reaching 0 in the 2050s. Finally, Albedo and Thickness feedbacks are shown to play an important role in accelerating the speed at which predicted SIE is heading towards 0.




ar

Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's Disease Classification of Gait Patterns. (arXiv:2005.02589v2 [cs.LG] UPDATED)

Application and use of deep learning algorithms for different healthcare applications is gaining interest at a steady pace. However, use of such algorithms can prove to be challenging as they require large amounts of training data that capture different possible variations. This makes it difficult to use them in a clinical setting since in most health applications researchers often have to work with limited data. Less data can cause the deep learning model to over-fit. In this paper, we ask how can we use data from a different environment, different use-case, with widely differing data distributions. We exemplify this use case by using single-sensor accelerometer data from healthy subjects performing activities of daily living - ADLs (source dataset), to extract features relevant to multi-sensor accelerometer gait data (target dataset) for Parkinson's disease classification. We train the pre-trained model using the source dataset and use it as a feature extractor. We show that the features extracted for the target dataset can be used to train an effective classification model. Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model. We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification.




ar

Statistical errors in Monte Carlo-based inference for random elements. (arXiv:2005.02532v2 [math.ST] UPDATED)

Monte Carlo simulation is useful to compute or estimate expected functionals of random elements if those random samples are possible to be generated from the true distribution. However, when the distribution has some unknown parameters, the samples must be generated from an estimated distribution with the parameters replaced by some estimators, which causes a statistical error in Monte Carlo estimation. This paper considers such a statistical error and investigates the asymptotic distributions of Monte Carlo-based estimators when the random elements are not only the real valued, but also functional valued random variables. We also investigate expected functionals for semimartingales in details. The consideration indicates that the Monte Carlo estimation can get worse when a semimartingale has a jump part with unremovable unknown parameters.




ar

Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies. (arXiv:2005.01923v2 [cs.CV] UPDATED)

Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tufts datasets for generating 3D varying face poses by using a single frontal face pose. The system works by refining the existing image quality by performing fusion based image preprocessing operations. The refined outputs have better contrast adjustments, decreased noise level and higher exposedness of the dark regions. It makes the facial landmarks and temperature patterns on the human face more discernible and visible when compared to original raw data. Different image quality metrics are used to compare the refined version of images with original images. In the next phase of the proposed study, the refined version of images is used to create 3D facial geometry structures by using Convolution Neural Networks (CNN). The generated outputs are then imported in blender software to finally extract the 3D thermal facial outputs of both males and females. The same technique is also used on our thermal face data acquired using prototype thermal camera (developed under Heliaus EU project) in an indoor lab environment which is then used for generating synthetic 3D face data along with varying yaw face angles and lastly facial depth map is generated.




ar

Interpreting Rate-Distortion of Variational Autoencoder and Using Model Uncertainty for Anomaly Detection. (arXiv:2005.01889v2 [cs.LG] UPDATED)

Building a scalable machine learning system for unsupervised anomaly detection via representation learning is highly desirable. One of the prevalent methods is using a reconstruction error from variational autoencoder (VAE) via maximizing the evidence lower bound. We revisit VAE from the perspective of information theory to provide some theoretical foundations on using the reconstruction error, and finally arrive at a simpler and more effective model for anomaly detection. In addition, to enhance the effectiveness of detecting anomalies, we incorporate a practical model uncertainty measure into the metric. We show empirically the competitive performance of our approach on benchmark datasets.




ar

How many modes can a constrained Gaussian mixture have?. (arXiv:2005.01580v2 [math.ST] UPDATED)

We show, by an explicit construction, that a mixture of univariate Gaussians with variance 1 and means in $[-A,A]$ can have $Omega(A^2)$ modes. This disproves a recent conjecture of Dytso, Yagli, Poor and Shamai [IEEE Trans. Inform. Theory, Apr. 2020], who showed that such a mixture can have at most $O(A^2)$ modes and surmised that the upper bound could be improved to $O(A)$. Our result holds even if an additional variance constraint is imposed on the mixing distribution. Extending the result to higher dimensions, we exhibit a mixture of Gaussians in $mathbb{R}^d$, with identity covariances and means inside $[-A,A]^d$, that has $Omega(A^{2d})$ modes.




ar

Is the NUTS algorithm correct?. (arXiv:2005.01336v2 [stat.CO] UPDATED)

This paper is devoted to investigate whether the popular No U-turn (NUTS) sampling algorithm is correct, i.e. whether the target probability distribution is emph{exactly} conserved by the algorithm. It turns out that one of the Gibbs substeps used in the algorithm cannot always be guaranteed to be correct.




ar

Can a powerful neural network be a teacher for a weaker neural network?. (arXiv:2005.00393v2 [cs.LG] UPDATED)

The transfer learning technique is widely used to learning in one context and applying it to another, i.e. the capacity to apply acquired knowledge and skills to new situations. But is it possible to transfer the learning from a deep neural network to a weaker neural network? Is it possible to improve the performance of a weak neural network using the knowledge acquired by a more powerful neural network? In this work, during the training process of a weak network, we add a loss function that minimizes the distance between the features previously learned from a strong neural network with the features that the weak network must try to learn. To demonstrate the effectiveness and robustness of our approach, we conducted a large number of experiments using three known datasets and demonstrated that a weak neural network can increase its performance if its learning process is driven by a more powerful neural network.




ar

Data-Space Inversion Using a Recurrent Autoencoder for Time-Series Parameterization. (arXiv:2005.00061v2 [stat.ML] UPDATED)

Data-space inversion (DSI) and related procedures represent a family of methods applicable for data assimilation in subsurface flow settings. These methods differ from model-based techniques in that they provide only posterior predictions for quantities (time series) of interest, not posterior models with calibrated parameters. DSI methods require a large number of flow simulations to first be performed on prior geological realizations. Given observed data, posterior predictions can then be generated directly. DSI operates in a Bayesian setting and provides posterior samples of the data vector. In this work we develop and evaluate a new approach for data parameterization in DSI. Parameterization reduces the number of variables to determine in the inversion, and it maintains the physical character of the data variables. The new parameterization uses a recurrent autoencoder (RAE) for dimension reduction, and a long-short-term memory (LSTM) network to represent flow-rate time series. The RAE-based parameterization is combined with an ensemble smoother with multiple data assimilation (ESMDA) for posterior generation. Results are presented for two- and three-phase flow in a 2D channelized system and a 3D multi-Gaussian model. The RAE procedure, along with existing DSI treatments, are assessed through comparison to reference rejection sampling (RS) results. The new DSI methodology is shown to consistently outperform existing approaches, in terms of statistical agreement with RS results. The method is also shown to accurately capture derived quantities, which are computed from variables considered directly in DSI. This requires correlation and covariance between variables to be properly captured, and accuracy in these relationships is demonstrated. The RAE-based parameterization developed here is clearly useful in DSI, and it may also find application in other subsurface flow problems.




ar

Short-term forecasts of COVID-19 spread across Indian states until 1 May 2020. (arXiv:2004.13538v2 [q-bio.PE] UPDATED)

The very first case of corona-virus illness was recorded on 30 January 2020, in India and the number of infected cases, including the death toll, continues to rise. In this paper, we present short-term forecasts of COVID-19 for 28 Indian states and five union territories using real-time data from 30 January to 21 April 2020. Applying Holt's second-order exponential smoothing method and autoregressive integrated moving average (ARIMA) model, we generate 10-day ahead forecasts of the likely number of infected cases and deaths in India for 22 April to 1 May 2020. Our results show that the number of cumulative cases in India will rise to 36335.63 [PI 95% (30884.56, 42918.87)], concurrently the number of deaths may increase to 1099.38 [PI 95% (959.77, 1553.76)] by 1 May 2020. Further, we have divided the country into severity zones based on the cumulative cases. According to this analysis, Maharashtra is likely to be the most affected states with around 9787.24 [PI 95% (6949.81, 13757.06)] cumulative cases by 1 May 2020. However, Kerala and Karnataka are likely to shift from the red zone (i.e. highly affected) to the lesser affected region. On the other hand, Gujarat and Madhya Pradesh will move to the red zone. These results mark the states where lockdown by 3 May 2020, can be loosened.




ar

A bimodal gamma distribution: Properties, regression model and applications. (arXiv:2004.12491v2 [stat.ME] UPDATED)

In this paper we propose a bimodal gamma distribution using a quadratic transformation based on the alpha-skew-normal model. We discuss several properties of this distribution such as mean, variance, moments, hazard rate and entropy measures. Further, we propose a new regression model with censored data based on the bimodal gamma distribution. This regression model can be very useful to the analysis of real data and could give more realistic fits than other special regression models. Monte Carlo simulations were performed to check the bias in the maximum likelihood estimation. The proposed models are applied to two real data sets found in literature.




ar

A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging. (arXiv:2004.12314v3 [cs.CV] UPDATED)

Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, automatic methods are of high interest, particularly optimized machine learning approaches. To address this, we organized the "2018 Left Atrium Segmentation Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset, and associated labels of the left atrium segmented by three medical experts, ultimately attracting the participation of 27 international teams. In this paper, extensive analysis of the submitted algorithms using technical and biological metrics was performed by undergoing subgroup analysis and conducting hyper-parameter analysis, offering an overall picture of the major design choices of convolutional neural networks (CNNs) and practical considerations for achieving state-of-the-art left atrium segmentation. Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm, significantly outperforming prior state-of-the-art. Particularly, our analysis demonstrated that double, sequentially used CNNs, in which a first CNN is used for automatic region-of-interest localization and a subsequent CNN is used for refined regional segmentation, achieved far superior results than traditional methods and pipelines containing single CNNs. This large-scale benchmarking study makes a significant step towards much-improved segmentation methods for cardiac LGE-MRIs, and will serve as an important benchmark for evaluating and comparing the future works in the field.




ar

Excess registered deaths in England and Wales during the COVID-19 pandemic, March 2020 and April 2020. (arXiv:2004.11355v4 [stat.AP] UPDATED)

Official counts of COVID-19 deaths have been criticized for potentially including people who did not die of COVID-19 but merely died with COVID-19. I address that critique by fitting a generalized additive model to weekly counts of all registered deaths in England and Wales during the 2010s. The model produces baseline rates of death registrations expected in the absence of the COVID-19 pandemic, and comparing those baselines to recent counts of registered deaths exposes the emergence of excess deaths late in March 2020. Among adults aged 45+, about 38,700 excess deaths were registered in the 5 weeks comprising 21 March through 24 April (612 $pm$ 416 from 21$-$27 March, 5675 $pm$ 439 from 28 March through 3 April, then 9183 $pm$ 468, 12,712 $pm$ 589, and 10,511 $pm$ 567 in April's next 3 weeks). Both the Office for National Statistics's respective count of 26,891 death certificates which mention COVID-19, and the Department of Health and Social Care's hospital-focused count of 21,222 deaths, are appreciably less, implying that their counting methods have underestimated rather than overestimated the pandemic's true death toll. If underreporting rates have held steady, about 45,900 direct and indirect COVID-19 deaths might have been registered by April's end but not yet publicly reported in full.




ar

On a phase transition in general order spline regression. (arXiv:2004.10922v2 [math.ST] UPDATED)

In the Gaussian sequence model $Y= heta_0 + varepsilon$ in $mathbb{R}^n$, we study the fundamental limit of approximating the signal $ heta_0$ by a class $Theta(d,d_0,k)$ of (generalized) splines with free knots. Here $d$ is the degree of the spline, $d_0$ is the order of differentiability at each inner knot, and $k$ is the maximal number of pieces. We show that, given any integer $dgeq 0$ and $d_0in{-1,0,ldots,d-1}$, the minimax rate of estimation over $Theta(d,d_0,k)$ exhibits the following phase transition: egin{equation*} egin{aligned} inf_{widetilde{ heta}}sup_{ hetainTheta(d,d_0, k)}mathbb{E}_ heta|widetilde{ heta} - heta|^2 asymp_d egin{cases} kloglog(16n/k), & 2leq kleq k_0,\ klog(en/k), & k geq k_0+1. end{cases} end{aligned} end{equation*} The transition boundary $k_0$, which takes the form $lfloor{(d+1)/(d-d_0) floor} + 1$, demonstrates the critical role of the regularity parameter $d_0$ in the separation between a faster $log log(16n)$ and a slower $log(en)$ rate. We further show that, once encouraging an additional '$d$-monotonicity' shape constraint (including monotonicity for $d = 0$ and convexity for $d=1$), the above phase transition is eliminated and the faster $kloglog(16n/k)$ rate can be achieved for all $k$. These results provide theoretical support for developing $ell_0$-penalized (shape-constrained) spline regression procedures as useful alternatives to $ell_1$- and $ell_2$-penalized ones.




ar

A Critical Overview of Privacy-Preserving Approaches for Collaborative Forecasting. (arXiv:2004.09612v3 [cs.LG] UPDATED)

Cooperation between different data owners may lead to an improvement in forecast quality - for instance by benefiting from spatial-temporal dependencies in geographically distributed time series. Due to business competitive factors and personal data protection questions, said data owners might be unwilling to share their data, which increases the interest in collaborative privacy-preserving forecasting. This paper analyses the state-of-the-art and unveils several shortcomings of existing methods in guaranteeing data privacy when employing Vector Autoregressive (VAR) models. The paper also provides mathematical proofs and numerical analysis to evaluate existing privacy-preserving methods, dividing them into three groups: data transformation, secure multi-party computations, and decomposition methods. The analysis shows that state-of-the-art techniques have limitations in preserving data privacy, such as a trade-off between privacy and forecasting accuracy, while the original data in iterative model fitting processes, in which intermediate results are shared, can be inferred after some iterations.




ar

Deep transfer learning for improving single-EEG arousal detection. (arXiv:2004.05111v2 [cs.CV] UPDATED)

Datasets in sleep science present challenges for machine learning algorithms due to differences in recording setups across clinics. We investigate two deep transfer learning strategies for overcoming the channel mismatch problem for cases where two datasets do not contain exactly the same setup leading to degraded performance in single-EEG models. Specifically, we train a baseline model on multivariate polysomnography data and subsequently replace the first two layers to prepare the architecture for single-channel electroencephalography data. Using a fine-tuning strategy, our model yields similar performance to the baseline model (F1=0.682 and F1=0.694, respectively), and was significantly better than a comparable single-channel model. Our results are promising for researchers working with small databases who wish to use deep learning models pre-trained on larger databases.




ar

Strong Converse for Testing Against Independence over a Noisy channel. (arXiv:2004.00775v2 [cs.IT] UPDATED)

A distributed binary hypothesis testing (HT) problem over a noisy (discrete and memoryless) channel studied previously by the authors is investigated from the perspective of the strong converse property. It was shown by Ahlswede and Csisz'{a}r that a strong converse holds in the above setting when the channel is rate-limited and noiseless. Motivated by this observation, we show that the strong converse continues to hold in the noisy channel setting for a special case of HT known as testing against independence (TAI), under the assumption that the channel transition matrix has non-zero elements. The proof utilizes the blowing up lemma and the recent change of measure technique of Tyagi and Watanabe as the key tools.




ar

Capturing and Explaining Trajectory Singularities using Composite Signal Neural Networks. (arXiv:2003.10810v2 [cs.LG] UPDATED)

Spatial trajectories are ubiquitous and complex signals. Their analysis is crucial in many research fields, from urban planning to neuroscience. Several approaches have been proposed to cluster trajectories. They rely on hand-crafted features, which struggle to capture the spatio-temporal complexity of the signal, or on Artificial Neural Networks (ANNs) which can be more efficient but less interpretable. In this paper we present a novel ANN architecture designed to capture the spatio-temporal patterns characteristic of a set of trajectories, while taking into account the demographics of the navigators. Hence, our model extracts markers linked to both behaviour and demographics. We propose a composite signal analyser (CompSNN) combining three simple ANN modules. Each of these modules uses different signal representations of the trajectory while remaining interpretable. Our CompSNN performs significantly better than its modules taken in isolation and allows to visualise which parts of the signal were most useful to discriminate the trajectories.




ar

Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A Multi-Agent Deep Reinforcement Learning Approach. (arXiv:2003.02157v2 [physics.soc-ph] UPDATED)

In recent years, multi-access edge computing (MEC) is a key enabler for handling the massive expansion of Internet of Things (IoT) applications and services. However, energy consumption of a MEC network depends on volatile tasks that induces risk for energy demand estimations. As an energy supplier, a microgrid can facilitate seamless energy supply. However, the risk associated with energy supply is also increased due to unpredictable energy generation from renewable and non-renewable sources. Especially, the risk of energy shortfall is involved with uncertainties in both energy consumption and generation. In this paper, we study a risk-aware energy scheduling problem for a microgrid-powered MEC network. First, we formulate an optimization problem considering the conditional value-at-risk (CVaR) measurement for both energy consumption and generation, where the objective is to minimize the loss of energy shortfall of the MEC networks and we show this problem is an NP-hard problem. Second, we analyze our formulated problem using a multi-agent stochastic game that ensures the joint policy Nash equilibrium, and show the convergence of the proposed model. Third, we derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based asynchronous advantage actor-critic (A3C) algorithm with shared neural networks. This method mitigates the curse of dimensionality of the state space and chooses the best policy among the agents for the proposed problem. Finally, the experimental results establish a significant performance gain by considering CVaR for high accuracy energy scheduling of the proposed model than both the single and random agent models.




ar

Mnemonics Training: Multi-Class Incremental Learning without Forgetting. (arXiv:2002.10211v3 [cs.CV] UPDATED)

Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without catastrophic forgetting of previous ones. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and exemplar-level. We conduct extensive experiments on three MCIL benchmarks, CIFAR-100, ImageNet-Subset and ImageNet, and show that using mnemonics exemplars can surpass the state-of-the-art by a large margin. Interestingly and quite intriguingly, the mnemonics exemplars tend to be on the boundaries between different classes.




ar

A Distributionally Robust Area Under Curve Maximization Model. (arXiv:2002.07345v2 [math.OC] UPDATED)

Area under ROC curve (AUC) is a widely used performance measure for classification models. We propose two new distributionally robust AUC maximization models (DR-AUC) that rely on the Kantorovich metric and approximate the AUC with the hinge loss function. We consider the two cases with respectively fixed and variable support for the worst-case distribution. We use duality theory to reformulate the DR-AUC models and derive tractable convex optimization problems. The numerical experiments show that the proposed DR-AUC models -- benchmarked with the standard deterministic AUC and the support vector machine models - perform better in general and in particular improve the worst-case out-of-sample performance over the majority of the considered datasets, thereby showing their robustness. The results are particularly encouraging since our numerical experiments are conducted with training sets of small size which have been known to be conducive to low out-of-sample performance.




ar

Statistical aspects of nuclear mass models. (arXiv:2002.04151v3 [nucl-th] UPDATED)

We study the information content of nuclear masses from the perspective of global models of nuclear binding energies. To this end, we employ a number of statistical methods and diagnostic tools, including Bayesian calibration, Bayesian model averaging, chi-square correlation analysis, principal component analysis, and empirical coverage probability. Using a Bayesian framework, we investigate the structure of the 4-parameter Liquid Drop Model by considering discrepant mass domains for calibration. We then use the chi-square correlation framework to analyze the 14-parameter Skyrme energy density functional calibrated using homogeneous and heterogeneous datasets. We show that a quite dramatic parameter reduction can be achieved in both cases. The advantage of Bayesian model averaging for improving uncertainty quantification is demonstrated. The statistical approaches used are pedagogically described; in this context this work can serve as a guide for future applications.




ar

Cyclic Boosting -- an explainable supervised machine learning algorithm. (arXiv:2002.03425v2 [cs.LG] UPDATED)

Supervised machine learning algorithms have seen spectacular advances and surpassed human level performance in a wide range of specific applications. However, using complex ensemble or deep learning algorithms typically results in black box models, where the path leading to individual predictions cannot be followed in detail. In order to address this issue, we propose the novel "Cyclic Boosting" machine learning algorithm, which allows to efficiently perform accurate regression and classification tasks while at the same time allowing a detailed understanding of how each individual prediction was made.