pe Spectral method and regularized MLE are both optimal for top-$K$ ranking By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang. Source: The Annals of Statistics, Volume 47, Number 4, 2204--2235.Abstract: This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley–Terry–Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress toward characterizing the performance (e.g., the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity—the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and noniterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis–Kahan $mathop{mathrm{sin}} olimits Theta $ theorem for symmetric matrices. This also allows us to close the gap between the $ell_{2}$ error upper bound for the spectral method and the minimax lower limit. Full Article
pe interoperability By looselycoupled.com Published On :: 2003-08-07T17:00:00-00:00 Ability to work with each other. In the loosely coupled environment of a service-oriented architecture, separate resources don't need to know the details of how they each work, but they need to have enough common ground to reliably exchange messages without error or misunderstanding. Standardized specifications go a long way towards creating this common ground, but differences in implementation may still lead to breakdowns in communication. Interoperability is when services can interact with each other without encountering such problems. Full Article
pe A hierarchical dependent Dirichlet process prior for modelling bird migration patterns in the UK By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Alex Diana, Eleni Matechou, Jim Griffin, Alison Johnston. Source: The Annals of Applied Statistics, Volume 14, Number 1, 473--493.Abstract: Environmental changes in recent years have been linked to phenological shifts which in turn are linked to the survival of species. The work in this paper is motivated by capture-recapture data on blackcaps collected by the British Trust for Ornithology as part of the Constant Effort Sites monitoring scheme. Blackcaps overwinter abroad and migrate to the UK annually for breeding purposes. We propose a novel Bayesian nonparametric approach for expressing the bivariate density of individual arrival and departure times at different sites across a number of years as a mixture model. The new model combines the ideas of the hierarchical and the dependent Dirichlet process, allowing the estimation of site-specific weights and year-specific mixture locations, which are modelled as functions of environmental covariates using a multivariate extension of the Gaussian process. The proposed modelling framework is extremely general and can be used in any context where multivariate density estimation is performed jointly across different groups and in the presence of a continuous covariate. Full Article
pe A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Zhonghua Liu, Ian Barnett, Xihong Lin. Source: The Annals of Applied Statistics, Volume 14, Number 1, 433--451.Abstract: Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum $p$-value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings. Full Article
pe Modifying the Chi-square and the CMH test for population genetic inference: Adapting to overdispersion By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Kerstin Spitzer, Marta Pelizzola, Andreas Futschik. Source: The Annals of Applied Statistics, Volume 14, Number 1, 202--220.Abstract: Evolve and resequence studies provide a popular approach to simulate evolution in the lab and explore its genetic basis. In this context, Pearson’s chi-square test, Fisher’s exact test as well as the Cochran–Mantel–Haenszel test are commonly used to infer genomic positions affected by selection from temporal changes in allele frequency. However, the null model associated with these tests does not match the null hypothesis of actual interest. Indeed, due to genetic drift and possibly other additional noise components such as pool sequencing, the null variance in the data can be substantially larger than accounted for by these common test statistics. This leads to $p$-values that are systematically too small and, therefore, a huge number of false positive results. Even, if the ranking rather than the actual $p$-values is of interest, a naive application of the mentioned tests will give misleading results, as the amount of overdispersion varies from locus to locus. We therefore propose adjusted statistics that take the overdispersion into account while keeping the formulas simple. This is particularly useful in genome-wide applications, where millions of SNPs can be handled with little computational effort. We then apply the adapted test statistics to real data from Drosophila and investigate how information from intermediate generations can be included when available. We also discuss further applications such as genome-wide association studies based on pool sequencing data and tests for local adaptation. Full Article
pe Surface temperature monitoring in liver procurement via functional variance change-point analysis By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Zhenguo Gao, Pang Du, Ran Jin, John L. Robertson. Source: The Annals of Applied Statistics, Volume 14, Number 1, 143--159.Abstract: Liver procurement experiments with surface-temperature monitoring motivated Gao et al. ( J. Amer. Statist. Assoc. 114 (2019) 773–781) to develop a variance change-point detection method under a smoothly-changing mean trend. However, the spotwise change points yielded from their method do not offer immediate information to surgeons since an organ is often transplanted as a whole or in part. We develop a new practical method that can analyze a defined portion of the organ surface at a time. It also provides a novel addition to the developing field of functional data monitoring. Furthermore, numerical challenge emerges for simultaneously modeling the variance functions of 2D locations and the mean function of location and time. The respective sample sizes in the scales of 10,000 and 1,000,000 for modeling these functions make standard spline estimation too costly to be useful. We introduce a multistage subsampling strategy with steps educated by quickly-computable preliminary statistical measures. Extensive simulations show that the new method can efficiently reduce the computational cost and provide reasonable parameter estimates. Application of the new method to our liver surface temperature monitoring data shows its effectiveness in providing accurate status change information for a selected portion of the organ in the experiment. Full Article
pe BART with targeted smoothing: An analysis of patient-specific stillbirth risk By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Jennifer E. Starling, Jared S. Murray, Carlos M. Carvalho, Radek K. Bukowski, James G. Scott. Source: The Annals of Applied Statistics, Volume 14, Number 1, 28--50.Abstract: This article introduces BART with Targeted Smoothing, or tsBART, a new Bayesian tree-based model for nonparametric regression. The goal of tsBART is to introduce smoothness over a single target covariate $t$ while not necessarily requiring smoothness over other covariates $x$. tsBART is based on the Bayesian Additive Regression Trees (BART) model, an ensemble of regression trees. tsBART extends BART by parameterizing each tree’s terminal nodes with smooth functions of $t$ rather than independent scalars. Like BART, tsBART captures complex nonlinear relationships and interactions among the predictors. But unlike BART, tsBART guarantees that the response surface will be smooth in the target covariate. This improves interpretability and helps to regularize the estimate. After introducing and benchmarking the tsBART model, we apply it to our motivating example—pregnancy outcomes data from the National Center for Health Statistics. Our aim is to provide patient-specific estimates of stillbirth risk across gestational age $(t)$ and based on maternal and fetal risk factors $(x)$. Obstetricians expect stillbirth risk to vary smoothly over gestational age but not necessarily over other covariates, and tsBART has been designed precisely to reflect this structural knowledge. The results of our analysis show the clear superiority of the tsBART model for quantifying stillbirth risk, thereby providing patients and doctors with better information for managing the risk of fetal mortality. All methods described here are implemented in the R package tsbart . Full Article
pe SHOPPER: A probabilistic model of consumer choice with substitutes and complements By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Francisco J. R. Ruiz, Susan Athey, David M. Blei. Source: The Annals of Applied Statistics, Volume 14, Number 1, 1--27.Abstract: We develop SHOPPER, a sequential probabilistic model of shopping data. SHOPPER uses interpretable components to model the forces that drive how a customer chooses products; in particular, we designed SHOPPER to capture how items interact with other items. We develop an efficient posterior inference algorithm to estimate these forces from large-scale data, and we analyze a large dataset from a major chain grocery store. We are interested in answering counterfactual queries about changes in prices. We found that SHOPPER provides accurate predictions even under price interventions, and that it helps identify complementary and substitutable pairs of products. Full Article
pe Empirical Bayes analysis of RNA sequencing experiments with auxiliary information By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Kun Liang. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2452--2482.Abstract: Finding differentially expressed genes is a common task in high-throughput transcriptome studies. While traditional statistical methods rank the genes by their test statistics alone, we analyze an RNA sequencing dataset using the auxiliary information of gene length and the test statistics from a related microarray study. Given the auxiliary information, we propose a novel nonparametric empirical Bayes procedure to estimate the posterior probability of differential expression for each gene. We demonstrate the advantage of our procedure in extensive simulation studies and a psoriasis RNA sequencing study. The companion R package calm is available at Bioconductor. Full Article
pe Propensity score weighting for causal inference with multiple treatments By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Fan Li, Fan Li. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2389--2415.Abstract: Causal or unconfounded descriptive comparisons between multiple groups are common in observational studies. Motivated from a racial disparity study in health services research, we propose a unified propensity score weighting framework, the balancing weights, for estimating causal effects with multiple treatments. These weights incorporate the generalized propensity scores to balance the weighted covariate distribution of each treatment group, all weighted toward a common prespecified target population. The class of balancing weights include several existing approaches such as the inverse probability weights and trimming weights as special cases. Within this framework, we propose a set of target estimands based on linear contrasts. We further develop the generalized overlap weights, constructed as the product of the inverse probability weights and the harmonic mean of the generalized propensity scores. The generalized overlap weighting scheme corresponds to the target population with the most overlap in covariates across the multiple treatments. These weights are bounded and thus bypass the problem of extreme propensities. We show that the generalized overlap weights minimize the total asymptotic variance of the moment weighting estimators for the pairwise contrasts within the class of balancing weights. We consider two balance check criteria and propose a new sandwich variance estimator for estimating the causal effects with generalized overlap weights. We apply these methods to study the racial disparities in medical expenditure between several racial groups using the 2009 Medical Expenditure Panel Survey (MEPS) data. Simulations were carried out to compare with existing methods. Full Article
pe A nonparametric spatial test to identify factors that shape a microbiome By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Susheela P. Singh, Ana-Maria Staicu, Robert R. Dunn, Noah Fierer, Brian J. Reich. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2341--2362.Abstract: The advent of high-throughput sequencing technologies has made data from DNA material readily available, leading to a surge of microbiome-related research establishing links between markers of microbiome health and specific outcomes. However, to harness the power of microbial communities we must understand not only how they affect us, but also how they can be influenced to improve outcomes. This area has been dominated by methods that reduce community composition to summary metrics, which can fail to fully exploit the complexity of community data. Recently, methods have been developed to model the abundance of taxa in a community, but they can be computationally intensive and do not account for spatial effects underlying microbial settlement. These spatial effects are particularly relevant in the microbiome setting because we expect communities that are close together to be more similar than those that are far apart. In this paper, we propose a flexible Bayesian spike-and-slab variable selection model for presence-absence indicators that accounts for spatial dependence and cross-dependence between taxa while reducing dimensionality in both directions. We show by simulation that in the presence of spatial dependence, popular distance-based hypothesis testing methods fail to preserve their advertised size, and the proposed method improves variable selection. Finally, we present an application of our method to an indoor fungal community found within homes across the contiguous United States. Full Article
pe A latent discrete Markov random field approach to identifying and classifying historical forest communities based on spatial multivariate tree species counts By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Stephen Berg, Jun Zhu, Murray K. Clayton, Monika E. Shea, David J. Mladenoff. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2312--2340.Abstract: The Wisconsin Public Land Survey database describes historical forest composition at high spatial resolution and is of interest in ecological studies of forest composition in Wisconsin just prior to significant Euro-American settlement. For such studies it is useful to identify recurring subpopulations of tree species known as communities, but standard clustering approaches for subpopulation identification do not account for dependence between spatially nearby observations. Here, we develop and fit a latent discrete Markov random field model for the purpose of identifying and classifying historical forest communities based on spatially referenced multivariate tree species counts across Wisconsin. We show empirically for the actual dataset and through simulation that our latent Markov random field modeling approach improves prediction and parameter estimation performance. For model fitting we introduce a new stochastic approximation algorithm which enables computationally efficient estimation and classification of large amounts of spatial multivariate count data. Full Article
pe Principal nested shape space analysis of molecular dynamics data By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Ian L. Dryden, Kwang-Rae Kim, Charles A. Laughton, Huiling Le. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2213--2234.Abstract: Molecular dynamics simulations produce huge datasets of temporal sequences of molecules. It is of interest to summarize the shape evolution of the molecules in a succinct, low-dimensional representation. However, Euclidean techniques such as principal components analysis (PCA) can be problematic as the data may lie far from in a flat manifold. Principal nested spheres gives a fundamentally different decomposition of data from the usual Euclidean subspace based PCA [ Biometrika 99 (2012) 551–568]. Subspaces of successively lower dimension are fitted to the data in a backwards manner with the aim of retaining signal and dispensing with noise at each stage. We adapt the methodology to 3D subshape spaces and provide some practical fitting algorithms. The methodology is applied to cluster analysis of peptides, where different states of the molecules can be identified. Also, the temporal transitions between cluster states are explored. Full Article
pe Estimating abundance from multiple sampling capture-recapture data via a multi-state multi-period stopover model By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Hannah Worthington, Rachel McCrea, Ruth King, Richard Griffiths. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2043--2064.Abstract: Capture-recapture studies often involve collecting data on numerous capture occasions over a relatively short period of time. For many study species this process is repeated, for example, annually, resulting in capture information spanning multiple sampling periods. To account for the different temporal scales, the robust design class of models have traditionally been applied providing a framework in which to analyse all of the available capture data in a single likelihood expression. However, these models typically require strong constraints, either the assumption of closure within a sampling period (the closed robust design) or conditioning on the number of individuals captured within a sampling period (the open robust design). For real datasets these assumptions may not be appropriate. We develop a general modelling structure that requires neither assumption by explicitly modelling the movement of individuals into the population both within and between the sampling periods, which in turn permits the estimation of abundance within a single consistent framework. The flexibility of the novel model structure is further demonstrated by including the computationally challenging case of multi-state data where there is individual time-varying discrete covariate information. We derive an efficient likelihood expression for the new multi-state multi-period stopover model using the hidden Markov model framework. We demonstrate the significant improvement in parameter estimation using our new modelling approach in terms of both the multi-period and multi-state components through both a simulation study and a real dataset relating to the protected species of great crested newts, Triturus cristatus . Full Article
pe Wavelet spectral testing: Application to nonstationary circadian rhythms By projecteuclid.org Published On :: Wed, 16 Oct 2019 22:03 EDT Jessica K. Hargreaves, Marina I. Knight, Jon W. Pitchford, Rachael J. Oakenfull, Sangeeta Chawla, Jack Munns, Seth J. Davis. Source: The Annals of Applied Statistics, Volume 13, Number 3, 1817--1846.Abstract: Rhythmic data are ubiquitous in the life sciences. Biologists need reliable statistical tests to identify whether a particular experimental treatment has caused a significant change in a rhythmic signal. When these signals display nonstationary behaviour, as is common in many biological systems, the established methodologies may be misleading. Therefore, there is a real need for new methodology that enables the formal comparison of nonstationary processes. As circadian behaviour is best understood in the spectral domain, here we develop novel hypothesis testing procedures in the (wavelet) spectral domain, embedding replicate information when available. The data are modelled as realisations of locally stationary wavelet processes, allowing us to define and rigorously estimate their evolutionary wavelet spectra. Motivated by three complementary applications in circadian biology, our new methodology allows the identification of three specific types of spectral difference. We demonstrate the advantages of our methodology over alternative approaches, by means of a comprehensive simulation study and real data applications, using both published and newly generated circadian datasets. In contrast to the current standard methodologies, our method successfully identifies differences within the motivating circadian datasets, and facilitates wider ranging analyses of rhythmic biological data in general. Full Article
pe Incorporating conditional dependence in latent class models for probabilistic record linkage: Does it matter? By projecteuclid.org Published On :: Wed, 16 Oct 2019 22:03 EDT Huiping Xu, Xiaochun Li, Changyu Shen, Siu L. Hui, Shaun Grannis. Source: The Annals of Applied Statistics, Volume 13, Number 3, 1753--1790.Abstract: The conditional independence assumption of the Felligi and Sunter (FS) model in probabilistic record linkage is often violated when matching real-world data. Ignoring conditional dependence has been shown to seriously bias parameter estimates. However, in record linkage, the ultimate goal is to inform the match status of record pairs and therefore, record linkage algorithms should be evaluated in terms of matching accuracy. In the literature, more flexible models have been proposed to relax the conditional independence assumption, but few studies have assessed whether such accommodations improve matching accuracy. In this paper, we show that incorporating the conditional dependence appropriately yields comparable or improved matching accuracy than the FS model using three real-world data linkage examples. Through a simulation study, we further investigate when conditional dependence models provide improved matching accuracy. Our study shows that the FS model is generally robust to the conditional independence assumption and provides comparable matching accuracy as the more complex conditional dependence models. However, when the match prevalence approaches 0% or 100% and conditional dependence exists in the dominating class, it is necessary to address conditional dependence as the FS model produces suboptimal matching accuracy. The need to address conditional dependence becomes less important when highly discriminating fields are used. Our simulation study also shows that conditional dependence models with misspecified dependence structure could produce less accurate record matching than the FS model and therefore we caution against the blind use of conditional dependence models. Full Article
pe Sequential decision model for inference and prediction on nonuniform hypergraphs with application to knot matching from computational forestry By projecteuclid.org Published On :: Wed, 16 Oct 2019 22:03 EDT Seong-Hwan Jun, Samuel W. K. Wong, James V. Zidek, Alexandre Bouchard-Côté. Source: The Annals of Applied Statistics, Volume 13, Number 3, 1678--1707.Abstract: In this paper, we consider the knot-matching problem arising in computational forestry. The knot-matching problem is an important problem that needs to be solved to advance the state of the art in automatic strength prediction of lumber. We show that this problem can be formulated as a quadripartite matching problem and develop a sequential decision model that admits efficient parameter estimation along with a sequential Monte Carlo sampler on graph matching that can be utilized for rapid sampling of graph matching. We demonstrate the effectiveness of our methods on 30 manually annotated boards and present findings from various simulation studies to provide further evidence supporting the efficacy of our methods. Full Article
pe Modeling seasonality and serial dependence of electricity price curves with warping functional autoregressive dynamics By projecteuclid.org Published On :: Wed, 16 Oct 2019 22:03 EDT Ying Chen, J. S. Marron, Jiejie Zhang. Source: The Annals of Applied Statistics, Volume 13, Number 3, 1590--1616.Abstract: Electricity prices are high dimensional, serially dependent and have seasonal variations. We propose a Warping Functional AutoRegressive (WFAR) model that simultaneously accounts for the cross time-dependence and seasonal variations of the large dimensional data. In particular, electricity price curves are obtained by smoothing over the $24$ discrete hourly prices on each day. In the functional domain, seasonal phase variations are separated from level amplitude changes in a warping process with the Fisher–Rao distance metric, and the aligned (season-adjusted) electricity price curves are modeled in the functional autoregression framework. In a real application, the WFAR model provides superior out-of-sample forecast accuracy in both a normal functioning market, Nord Pool, and an extreme situation, the California market. The forecast performance as well as the relative accuracy improvement are stable for different markets and different time periods. Full Article
pe The classification permutation test: A flexible approach to testing for covariate imbalance in observational studies By projecteuclid.org Published On :: Wed, 16 Oct 2019 22:03 EDT Johann Gagnon-Bartsch, Yotam Shem-Tov. Source: The Annals of Applied Statistics, Volume 13, Number 3, 1464--1483.Abstract: The gold standard for identifying causal relationships is a randomized controlled experiment. In many applications in the social sciences and medicine, the researcher does not control the assignment mechanism and instead may rely upon natural experiments or matching methods as a substitute to experimental randomization. The standard testable implication of random assignment is covariate balance between the treated and control units. Covariate balance is commonly used to validate the claim of as good as random assignment. We propose a new nonparametric test of covariate balance. Our Classification Permutation Test (CPT) is based on a combination of classification methods (e.g., random forests) with Fisherian permutation inference. We revisit four real data examples and present Monte Carlo power simulations to demonstrate the applicability of the CPT relative to other nonparametric tests of equality of multivariate distributions. Full Article
pe Introduction to papers on the modeling and analysis of network data—II By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Stephen E. FienbergSource: Ann. Appl. Stat., Volume 4, Number 2, 533--534. Full Article
pe Stratonovich type integration with respect to fractional Brownian motion with Hurst parameter less than $1/2$ By projecteuclid.org Published On :: Mon, 27 Apr 2020 04:02 EDT Jorge A. León. Source: Bernoulli, Volume 26, Number 3, 2436--2462.Abstract: Let $B^{H}$ be a fractional Brownian motion with Hurst parameter $Hin (0,1/2)$ and $p:mathbb{R} ightarrow mathbb{R}$ a polynomial function. The main purpose of this paper is to introduce a Stratonovich type stochastic integral with respect to $B^{H}$, whose domain includes the process $p(B^{H})$. That is, an integral that allows us to integrate $p(B^{H})$ with respect to $B^{H}$, which does not happen with the symmetric integral given by Russo and Vallois ( Probab. Theory Related Fields 97 (1993) 403–421) in general. Towards this end, we combine the approaches utilized by León and Nualart ( Stochastic Process. Appl. 115 (2005) 481–492), and Russo and Vallois ( Probab. Theory Related Fields 97 (1993) 403–421), whose aims are to extend the domain of the divergence operator for Gaussian processes and to define some stochastic integrals, respectively. Then, we study the relation between this Stratonovich integral and the extension of the divergence operator (see León and Nualart ( Stochastic Process. Appl. 115 (2005) 481–492)), an Itô formula and the existence of a unique solution of some Stratonovich stochastic differential equations. These last results have been analyzed by Alòs, León and Nualart ( Taiwanese J. Math. 5 (2001) 609–632), where the Hurst paramert $H$ belongs to the interval $(1/4,1/2)$. Full Article
pe A refined Cramér-type moderate deviation for sums of local statistics By projecteuclid.org Published On :: Mon, 27 Apr 2020 04:02 EDT Xiao Fang, Li Luo, Qi-Man Shao. Source: Bernoulli, Volume 26, Number 3, 2319--2352.Abstract: We prove a refined Cramér-type moderate deviation result by taking into account of the skewness in normal approximation for sums of local statistics of independent random variables. We apply the main result to $k$-runs, U-statistics and subgraph counts in the Erdős–Rényi random graph. To prove our main result, we develop exponential concentration inequalities and higher-order tail probability expansions via Stein’s method. Full Article
pe Convergence of persistence diagrams for topological crackle By projecteuclid.org Published On :: Mon, 27 Apr 2020 04:02 EDT Takashi Owada, Omer Bobrowski. Source: Bernoulli, Volume 26, Number 3, 2275--2310.Abstract: In this paper, we study the persistent homology associated with topological crackle generated by distributions with an unbounded support. Persistent homology is a topological and algebraic structure that tracks the creation and destruction of topological cycles (generalizations of loops or holes) in different dimensions. Topological crackle is a term that refers to topological cycles generated by random points far away from the bulk of other points, when the support is unbounded. We establish weak convergence results for persistence diagrams – a point process representation for persistent homology, where each topological cycle is represented by its $({mathit{birth},mathit{death}})$ coordinates. In this work, we treat persistence diagrams as random closed sets, so that the resulting weak convergence is defined in terms of the Fell topology. Using this framework, we show that the limiting persistence diagrams can be divided into two parts. The first part is a deterministic limit containing a densely-growing number of persistence pairs with a shorter lifespan. The second part is a two-dimensional Poisson process, representing persistence pairs with a longer lifespan. Full Article
pe Concentration of the spectral norm of Erdős–Rényi random graphs By projecteuclid.org Published On :: Mon, 27 Apr 2020 04:02 EDT Gábor Lugosi, Shahar Mendelson, Nikita Zhivotovskiy. Source: Bernoulli, Volume 26, Number 3, 2253--2274.Abstract: We present results on the concentration properties of the spectral norm $|A_{p}|$ of the adjacency matrix $A_{p}$ of an Erdős–Rényi random graph $G(n,p)$. First, we consider the Erdős–Rényi random graph process and prove that $|A_{p}|$ is uniformly concentrated over the range $pin[Clog n/n,1]$. The analysis is based on delocalization arguments, uniform laws of large numbers, together with the entropy method to prove concentration inequalities. As an application of our techniques, we prove sharp sub-Gaussian moment inequalities for $|A_{p}|$ for all $pin[clog^{3}n/n,1]$ that improve the general bounds of Alon, Krivelevich, and Vu ( Israel J. Math. 131 (2002) 259–267) and some of the more recent results of Erdős et al. ( Ann. Probab. 41 (2013) 2279–2375). Both results are consistent with the asymptotic result of Füredi and Komlós ( Combinatorica 1 (1981) 233–241) that holds for fixed $p$ as $n oinfty$. Full Article
pe Scaling limits for super-replication with transient price impact By projecteuclid.org Published On :: Mon, 27 Apr 2020 04:02 EDT Peter Bank, Yan Dolinsky. Source: Bernoulli, Volume 26, Number 3, 2176--2201.Abstract: We prove a scaling limit theorem for the super-replication cost of options in a Cox–Ross–Rubinstein binomial model with transient price impact. The correct scaling turns out to keep the market depth parameter constant while resilience over fixed periods of time grows in inverse proportion with the duration between trading times. For vanilla options, the scaling limit is found to coincide with the one obtained by PDE-methods in ( Math. Finance 22 (2012) 250–276) for models with purely temporary price impact. These models are a special case of our framework and so our probabilistic scaling limit argument allows one to expand the scope of the scaling limit result to path-dependent options. Full Article
pe Directional differentiability for supremum-type functionals: Statistical applications By projecteuclid.org Published On :: Mon, 27 Apr 2020 04:02 EDT Javier Cárcamo, Antonio Cuevas, Luis-Alberto Rodríguez. Source: Bernoulli, Volume 26, Number 3, 2143--2175.Abstract: We show that various functionals related to the supremum of a real function defined on an arbitrary set or a measure space are Hadamard directionally differentiable. We specifically consider the supremum norm, the supremum, the infimum, and the amplitude of a function. The (usually non-linear) derivatives of these maps adopt simple expressions under suitable assumptions on the underlying space. As an application, we improve and extend to the multidimensional case the results in Raghavachari ( Ann. Statist. 1 (1973) 67–73) regarding the limiting distributions of Kolmogorov–Smirnov type statistics under the alternative hypothesis. Similar results are obtained for analogous statistics associated with copulas. We additionally solve an open problem about the Berk–Jones statistic proposed by Jager and Wellner (In A Festschrift for Herman Rubin (2004) 319–331 IMS). Finally, the asymptotic distribution of maximum mean discrepancies over Donsker classes of functions is derived. Full Article
pe Perfect sampling for Gibbs point processes using partial rejection sampling By projecteuclid.org Published On :: Mon, 27 Apr 2020 04:02 EDT Sarat B. Moka, Dirk P. Kroese. Source: Bernoulli, Volume 26, Number 3, 2082--2104.Abstract: We present a perfect sampling algorithm for Gibbs point processes, based on the partial rejection sampling of Guo, Jerrum and Liu (In STOC’17 – Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (2017) 342–355 ACM). Our particular focus is on pairwise interaction processes, penetrable spheres mixture models and area-interaction processes, with a finite interaction range. For an interaction range $2r$ of the target process, the proposed algorithm can generate a perfect sample with $O(log(1/r))$ expected running time complexity, provided that the intensity of the points is not too high and $Theta(1/r^{d})$ parallel processor units are available. Full Article
pe Optimal functional supervised classification with separation condition By projecteuclid.org Published On :: Mon, 27 Apr 2020 04:02 EDT Sébastien Gadat, Sébastien Gerchinovitz, Clément Marteau. Source: Bernoulli, Volume 26, Number 3, 1797--1831.Abstract: We consider the binary supervised classification problem with the Gaussian functional model introduced in ( Math. Methods Statist. 22 (2013) 213–225). Taking advantage of the Gaussian structure, we design a natural plug-in classifier and derive a family of upper bounds on its worst-case excess risk over Sobolev spaces. These bounds are parametrized by a separation distance quantifying the difficulty of the problem, and are proved to be optimal (up to logarithmic factors) through matching minimax lower bounds. Using the recent works of (In Advances in Neural Information Processing Systems (2014) 3437–3445 Curran Associates) and ( Ann. Statist. 44 (2016) 982–1009), we also derive a logarithmic lower bound showing that the popular $k$-nearest neighbors classifier is far from optimality in this specific functional setting. Full Article
pe On the probability distribution of the local times of diagonally operator-self-similar Gaussian fields with stationary increments By projecteuclid.org Published On :: Fri, 31 Jan 2020 04:06 EST Kamran Kalbasi, Thomas Mountford. Source: Bernoulli, Volume 26, Number 2, 1504--1534.Abstract: In this paper, we study the local times of vector-valued Gaussian fields that are ‘diagonally operator-self-similar’ and whose increments are stationary. Denoting the local time of such a Gaussian field around the spatial origin and over the temporal unit hypercube by $Z$, we show that there exists $lambdain(0,1)$ such that under some quite weak conditions, $lim_{n ightarrow+infty}frac{sqrt[n]{mathbb{E}(Z^{n})}}{n^{lambda}}$ and $lim_{x ightarrow+infty}frac{-logmathbb{P}(Z>x)}{x^{frac{1}{lambda}}}$ both exist and are strictly positive (possibly $+infty$). Moreover, we show that if the underlying Gaussian field is ‘strongly locally nondeterministic’, the above limits will be finite as well. These results are then applied to establish similar statements for the intersection local times of diagonally operator-self-similar Gaussian fields with stationary increments. Full Article
pe A characterization of the finiteness of perpetual integrals of Lévy processes By projecteuclid.org Published On :: Fri, 31 Jan 2020 04:06 EST Martin Kolb, Mladen Savov. Source: Bernoulli, Volume 26, Number 2, 1453--1472.Abstract: We derive a criterium for the almost sure finiteness of perpetual integrals of Lévy processes for a class of real functions including all continuous functions and for general one-dimensional Lévy processes that drifts to plus infinity. This generalizes previous work of Döring and Kyprianou, who considered Lévy processes having a local time, leaving the general case as an open problem. It turns out, that the criterium in the general situation simplifies significantly in the situation, where the process has a local time, but we also demonstrate that in general our criterium can not be reduced. This answers an open problem posed in ( J. Theoret. Probab. 29 (2016) 1192–1198). Full Article
pe Recurrence of multidimensional persistent random walks. Fourier and series criteria By projecteuclid.org Published On :: Fri, 31 Jan 2020 04:06 EST Peggy Cénac, Basile de Loynes, Yoann Offret, Arnaud Rousselle. Source: Bernoulli, Volume 26, Number 2, 858--892.Abstract: The recurrence and transience of persistent random walks built from variable length Markov chains are investigated. It turns out that these stochastic processes can be seen as Lévy walks for which the persistence times depend on some internal Markov chain: they admit Markov random walk skeletons. A recurrence versus transience dichotomy is highlighted. Assuming the positive recurrence of the driving chain, a sufficient Fourier criterion for the recurrence, close to the usual Chung–Fuchs one, is given and a series criterion is derived. The key tool is the Nagaev–Guivarc’h method. Finally, we focus on particular two-dimensional persistent random walks, including directionally reinforced random walks, for which necessary and sufficient Fourier and series criteria are obtained. Inspired by ( Adv. Math. 208 (2007) 680–698), we produce a genuine counterexample to the conjecture of ( Adv. Math. 117 (1996) 239–252). As for the one-dimensional case studied in ( J. Theoret. Probab. 31 (2018) 232–243), it is easier for a persistent random walk than its skeleton to be recurrent. However, such examples are much more difficult to exhibit in the higher dimensional context. These results are based on a surprisingly novel – to our knowledge – upper bound for the Lévy concentration function associated with symmetric distributions. Full Article
pe Stochastic differential equations with a fractionally filtered delay: A semimartingale model for long-range dependent processes By projecteuclid.org Published On :: Fri, 31 Jan 2020 04:06 EST Richard A. Davis, Mikkel Slot Nielsen, Victor Rohde. Source: Bernoulli, Volume 26, Number 2, 799--827.Abstract: In this paper, we introduce a model, the stochastic fractional delay differential equation (SFDDE), which is based on the linear stochastic delay differential equation and produces stationary processes with hyperbolically decaying autocovariance functions. The model departs from the usual way of incorporating this type of long-range dependence into a short-memory model as it is obtained by applying a fractional filter to the drift term rather than to the noise term. The advantages of this approach are that the corresponding long-range dependent solutions are semimartingales and the local behavior of the sample paths is unaffected by the degree of long memory. We prove existence and uniqueness of solutions to the SFDDEs and study their spectral densities and autocovariance functions. Moreover, we define a subclass of SFDDEs which we study in detail and relate to the well-known fractionally integrated CARMA processes. Finally, we consider the task of simulating from the defining SFDDEs. Full Article
pe Tail expectile process and risk assessment By projecteuclid.org Published On :: Tue, 26 Nov 2019 04:00 EST Abdelaati Daouia, Stéphane Girard, Gilles Stupfler. Source: Bernoulli, Volume 26, Number 1, 531--556.Abstract: Expectiles define a least squares analogue of quantiles. They are determined by tail expectations rather than tail probabilities. For this reason and many other theoretical and practical merits, expectiles have recently received a lot of attention, especially in actuarial and financial risk management. Their estimation, however, typically requires to consider non-explicit asymmetric least squares estimates rather than the traditional order statistics used for quantile estimation. This makes the study of the tail expectile process a lot harder than that of the standard tail quantile process. Under the challenging model of heavy-tailed distributions, we derive joint weighted Gaussian approximations of the tail empirical expectile and quantile processes. We then use this powerful result to introduce and study new estimators of extreme expectiles and the standard quantile-based expected shortfall, as well as a novel expectile-based form of expected shortfall. Our estimators are built on general weighted combinations of both top order statistics and asymmetric least squares estimates. Some numerical simulations and applications to actuarial and financial data are provided. Full Article
pe Operator-scaling Gaussian random fields via aggregation By projecteuclid.org Published On :: Tue, 26 Nov 2019 04:00 EST Yi Shen, Yizao Wang. Source: Bernoulli, Volume 26, Number 1, 500--530.Abstract: We propose an aggregated random-field model, and investigate the scaling limits of the aggregated partial-sum random fields. In this model, each copy in the aggregation is a $pm 1$-valued random field built from two correlated one-dimensional random walks, the law of each determined by a random persistence parameter. A flexible joint distribution of the two parameters is introduced, and given the parameters the two correlated random walks are conditionally independent. For the aggregated random field, when the persistence parameters are independent, the scaling limit is a fractional Brownian sheet. When the persistence parameters are tail-dependent, characterized in the framework of multivariate regular variation, the scaling limit is more delicate, and in particular depends on the growth rates of the underlying rectangular region along two directions: at different rates different operator-scaling Gaussian random fields appear as the region area tends to infinity. In particular, at the so-called critical speed, a large family of Gaussian random fields with long-range dependence arise in the limit. We also identify four different regimes at non-critical speed where fractional Brownian sheets arise in the limit. Full Article
pe Subspace perspective on canonical correlation analysis: Dimension reduction and minimax rates By projecteuclid.org Published On :: Tue, 26 Nov 2019 04:00 EST Zhuang Ma, Xiaodong Li. Source: Bernoulli, Volume 26, Number 1, 432--470.Abstract: Canonical correlation analysis (CCA) is a fundamental statistical tool for exploring the correlation structure between two sets of random variables. In this paper, motivated by the recent success of applying CCA to learn low dimensional representations of high dimensional objects, we propose two losses based on the principal angles between the model spaces spanned by the sample canonical variates and their population correspondents, respectively. We further characterize the non-asymptotic error bounds for the estimation risks under the proposed error metrics, which reveal how the performance of sample CCA depends adaptively on key quantities including the dimensions, the sample size, the condition number of the covariance matrices and particularly the population canonical correlation coefficients. The optimality of our uniform upper bounds is also justified by lower-bound analysis based on stringent and localized parameter spaces. To the best of our knowledge, for the first time our paper separates $p_{1}$ and $p_{2}$ for the first order term in the upper bounds without assuming the residual correlations are zeros. More significantly, our paper derives $(1-lambda_{k}^{2})(1-lambda_{k+1}^{2})/(lambda_{k}-lambda_{k+1})^{2}$ for the first time in the non-asymptotic CCA estimation convergence rates, which is essential to understand the behavior of CCA when the leading canonical correlation coefficients are close to $1$. Full Article
pe SPDEs with fractional noise in space: Continuity in law with respect to the Hurst index By projecteuclid.org Published On :: Tue, 26 Nov 2019 04:00 EST Luca M. Giordano, Maria Jolis, Lluís Quer-Sardanyons. Source: Bernoulli, Volume 26, Number 1, 352--386.Abstract: In this article, we consider the quasi-linear stochastic wave and heat equations on the real line and with an additive Gaussian noise which is white in time and behaves in space like a fractional Brownian motion with Hurst index $Hin (0,1)$. The drift term is assumed to be globally Lipschitz. We prove that the solution of each of the above equations is continuous in terms of the index $H$, with respect to the convergence in law in the space of continuous functions. Full Article
pe Weak convergence of quantile and expectile processes under general assumptions By projecteuclid.org Published On :: Tue, 26 Nov 2019 04:00 EST Tobias Zwingmann, Hajo Holzmann. Source: Bernoulli, Volume 26, Number 1, 323--351.Abstract: We show weak convergence of quantile and expectile processes to Gaussian limit processes in the space of bounded functions endowed with an appropriate semimetric which is based on the concepts of epi- and hypo- convergence as introduced in A. Bücher, J. Segers and S. Volgushev (2014), ‘ When Uniform Weak Convergence Fails: Empirical Processes for Dependence Functions and Residuals via Epi- and Hypographs ’, Annals of Statistics 42 . We impose assumptions for which it is known that weak convergence with respect to the supremum norm generally fails to hold. For quantiles, we consider stationary observations, where the marginal distribution function is assumed to be strictly increasing and continuous except for finitely many points and to admit strictly positive – possibly infinite – left- and right-sided derivatives. For expectiles, we focus on independent and identically distributed (i.i.d.) observations. Only a finite second moment and continuity at the boundary points but no further smoothness properties of the distribution function are required. We also show consistency of the bootstrap for this mode of convergence in the i.i.d. case for quantiles and expectiles. Full Article
pe Prediction and estimation consistency of sparse multi-class penalized optimal scoring By projecteuclid.org Published On :: Tue, 26 Nov 2019 04:00 EST Irina Gaynanova. Source: Bernoulli, Volume 26, Number 1, 286--322.Abstract: Sparse linear discriminant analysis via penalized optimal scoring is a successful tool for classification in high-dimensional settings. While the variable selection consistency of sparse optimal scoring has been established, the corresponding prediction and estimation consistency results have been lacking. We bridge this gap by providing probabilistic bounds on out-of-sample prediction error and estimation error of multi-class penalized optimal scoring allowing for diverging number of classes. Full Article
pe A new method for obtaining sharp compound Poisson approximation error estimates for sums of locally dependent random variables By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Michael V. Boutsikas, Eutichia VaggelatouSource: Bernoulli, Volume 16, Number 2, 301--330.Abstract: Let X 1 , X 2 , …, X n be a sequence of independent or locally dependent random variables taking values in ℤ + . In this paper, we derive sharp bounds, via a new probabilistic method, for the total variation distance between the distribution of the sum ∑ i =1 n X i and an appropriate Poisson or compound Poisson distribution. These bounds include a factor which depends on the smoothness of the approximating Poisson or compound Poisson distribution. This “smoothness factor” is of order O( σ −2 ), according to a heuristic argument, where σ 2 denotes the variance of the approximating distribution. In this way, we offer sharp error estimates for a large range of values of the parameters. Finally, specific examples concerning appearances of rare runs in sequences of Bernoulli trials are presented by way of illustration. Full Article
pe English given names : popularity, spelling variants, diminutives and abbreviations / by Carol Baxter. By www.catalog.slsa.sa.gov.au Published On :: Names, Personal -- England. Full Article
pe High on the hill : the people of St Philip & St James Church, Old Noarlunga / City of Onkaparinga. By www.catalog.slsa.sa.gov.au Published On :: St. Philip and St. James Church (Noarlunga, S.A.) Full Article
pe From Westphalia to South Australia : the story of Franz Heinrich Ernst Siekmann / by Peter Brinkworth. By www.catalog.slsa.sa.gov.au Published On :: Siekmann, Francis Heinrich Ernst, 1830-1917. Full Article
pe High on the hill : the people of St Philip & St James Church, Old Noarlunga%cCity of Onkaparinga. By www.catalog.slsa.sa.gov.au Published On :: St. Philip and St. James Church (Noarlunga, S.A.) Full Article
pe By the richest of God's grace / Anna Penney. By www.catalog.slsa.sa.gov.au Published On :: Penney, Anna -- Travels. Full Article
pe Welsh given names : popularity, spelling variants, diminutives and abbreviations / by Carol Baxter. By www.catalog.slsa.sa.gov.au Published On :: Names, Personal -- Welsh. Full Article
pe Scottish given names : popularity, spelling variants, diminutives and abbreviations / by Carol Baxter. By www.catalog.slsa.sa.gov.au Published On :: Names, Personal -- Scottish. Full Article
pe South Australian history sources / by Andrew Guy Peake. By www.catalog.slsa.sa.gov.au Published On :: South Australia -- History -- Sources. Full Article
pe Living through English history : stories of the Urlwin, Brittridge, Vasper, Partridge and Ellerby families / Janet McLeod. By www.catalog.slsa.sa.gov.au Published On :: Urlwin (Family). Full Article
pe Cook family history papers By www.catalog.slsa.sa.gov.au Published On :: Cook, William, 1815-1897 Full Article
pe Slow tain to Auschwitz : memoirs of a life in war and peace / Peter Kraus. By www.catalog.slsa.sa.gov.au Published On :: Kraus, Peter -- Biography. Full Article