Latest an news

Imputation and post-selection inference in models with missing data: An application to colorectal cancer surveillance guidelines

By projecteuclid.org
Published On :: Wed, 16 Oct 2019 22:03 EDT

Lin Liu, Yuqi Qiu, Loki Natarajan, Karen Messer.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1370--1396.

Abstract:
It is common to encounter missing data among the potential predictor variables in the setting of model selection. For example, in a recent study we attempted to improve the US guidelines for risk stratification after screening colonoscopy ( Cancer Causes Control 27 (2016) 1175–1185), with the aim to help reduce both overuse and underuse of follow-on surveillance colonoscopy. The goal was to incorporate selected additional informative variables into a neoplasia risk-prediction model, going beyond the three currently established risk factors, using a large dataset pooled from seven different prospective studies in North America. Unfortunately, not all candidate variables were collected in all studies, so that one or more important potential predictors were missing on over half of the subjects. Thus, while variable selection was a main focus of the study, it was necessary to address the substantial amount of missing data. Multiple imputation can effectively address missing data, and there are also good approaches to incorporate the variable selection process into model-based confidence intervals. However, there is not consensus on appropriate methods of inference which address both issues simultaneously. Our goal here is to study the properties of model-based confidence intervals in the setting of imputation for missing data followed by variable selection. We use both simulation and theory to compare three approaches to such post-imputation-selection inference: a multiple-imputation approach based on Rubin’s Rules for variance estimation ( Comput. Statist. Data Anal. 71 (2014) 758–770); a single imputation-selection followed by bootstrap percentile confidence intervals; and a new bootstrap model-averaging approach presented here, following Efron ( J. Amer. Statist. Assoc. 109 (2014) 991–1007). We investigate relative strengths and weaknesses of each method. The “Rubin’s Rules” multiple imputation estimator can have severe undercoverage, and is not recommended. The imputation-selection estimator with bootstrap percentile confidence intervals works well. The bootstrap-model-averaged estimator, with the “Efron’s Rules” estimated variance, may be preferred if the true effect sizes are moderate. We apply these results to the colorectal neoplasia risk-prediction problem which motivated the present work.

Imputation and post-selection inference in models with missing data: An application to colorectal cancer surveillance guidelines

Introduction to papers on the modeling and analysis of network data—II

Stratonovich type integration with respect to fractional Brownian motion with Hurst parameter less than &#36;1/2&#36;

Local law and Tracy–Widom limit for sparse stochastic block models

Frequency domain theory for functional time series: Variance decomposition and an invariance principle

Bayesian linear regression for multivariate responses under group sparsity

Concentration of the spectral norm of Erdős–Rényi random graphs

On Sobolev tests of uniformity on the circle with an extension to the sphere

Exponential integrability and exit times of diffusions on sub-Riemannian and metric measure spaces

Scaling limits for super-replication with transient price impact

Noncommutative Lebesgue decomposition and contiguity with applications in quantum statistics

First-order covariance inequalities via Stein’s method

On estimation of nonsmooth functionals of sparse normal means

On sampling from a log-concave density using kinetic Langevin diffusions

Busemann functions and semi-infinite O’Connell–Yor polymers

On the best constant in the martingale version of Fefferman’s inequality

Functional weak limit theorem for a local empirical process of non-stationary time series and its application

Logarithmic Sobolev inequalities for finite spin systems and applications

Kernel and wavelet density estimators on manifolds and more general metric spaces

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

On the eigenproblem for Gaussian bridges

Sojourn time dimensions of fractional Brownian motion

Random orthogonal matrices and the Cayley transform

On the probability distribution of the local times of diagonally operator-self-similar Gaussian fields with stationary increments

Around the entropic Talagrand inequality

The moduli of non-differentiability for Gaussian random fields with stationary increments

Stratonovich stochastic differential equation with irregular coefficients: Girsanov’s example revisited

On stability of traveling wave solutions for integro-differential equations related to branching Markov processes

A new McKean–Vlasov stochastic interpretation of the parabolic–parabolic Keller–Segel model: The one-dimensional case

Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem

Dynamic linear discriminant analysis in high dimensional space

Consistent structure estimation of exponential-family random graph models with block structure

Characterization of probability distribution convergence in Wasserstein distance by &#36;L^{p}&#36;-quantization error function

Interacting reinforced stochastic processes: Statistical inference based on the weighted empirical means

A Bayesian nonparametric approach to log-concave density estimation

Stable processes conditioned to hit an interval continuously from the outside

Distances and large deviations in the spatial preferential attachment model

Recurrence of multidimensional persistent random walks. Fourier and series criteria

Stochastic differential equations with a fractionally filtered delay: A semimartingale model for long-range dependent processes

Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces

A Feynman–Kac result via Markov BSDEs with generalised drivers

Robust modifications of U-statistics and applications to covariance estimation problems

A unified approach to coupling SDEs driven by Lévy noise and some applications

On frequentist coverage errors of Bayesian credible sets in moderately high dimensions

Normal approximation for sums of weighted &#36;U&#36;-statistics – application to Kolmogorov bounds in random subgraph counting

Tail expectile process and risk assessment

Operator-scaling Gaussian random fields via aggregation

Subspace perspective on canonical correlation analysis: Dimension reduction and minimax rates

High dimensional deformed rectangular matrices with applications in matrix denoising

The Finish Line: Cast Stone and EIFS

The Finish Line: Changing Stucco to EIFS

The Finish Line: Cleaning EIFS

The Finish Line: Earthquakes and EIFS

The Finish Line: Adhesives vs. Mechanical Fasteners

The Finish Line: EPS Vs. Polyisocyanurate Insulation

The Finish Line: Sealants

The Finish Line: Building Walls in the Land Down Under

EPDs, HPDs and Red Lists (Oh My)!

Building Product Transparency— Be Careful What You Ask For

Anti-LEED Legislation

An Energy Label for Buildings

Benefits of the Variable Refrigerant Flow

New Gadget Analyzes Everything Including Building Industry

ANSI Green Globes 2015

Subscribe To Our Newsletter

Stratonovich type integration with respect to fractional Brownian motion with Hurst parameter less than $1/2$

Characterization of probability distribution convergence in Wasserstein distance by $L^{p}$-quantization error function

Normal approximation for sums of weighted $U$-statistics – application to Kolmogorov bounds in random subgraph counting